Home | History | Annotate | only in /src/sys/netinet
History log of /src/sys/netinet
RevisionDateAuthorComments
 1.32 28-Oct-2022  ozaki-r Remove in_pcb_hdr.h
 1.31 20-Sep-2022  ozaki-r tcp: separate syn cache stuffs into tcp_syncache.[ch] files

No functional change.
 1.30 06-Sep-2018  maxv Remove the network ATM code.
 1.29 11-Jul-2018  kre Fix build. pf_ioctl.c needs netinet/in_offload.h (after previous change).
Because this is in a module, apparently, that means that netinet_in_offload.h
needs to get installed in /usr/include, so do that as well.

Feel free to fix this in a better way...
 1.28 16-Feb-2017  knakahara branches: 1.28.12; 1.28.14;
add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.27 13-Oct-2015  rjs branches: 1.27.2; 1.27.4;
Add core networking support for SCTP.
 1.26 10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.25 15-Sep-2012  plunky branches: 1.25.14;

install header files from IPF 5.1.2 (sys/external/bsd/ipf) instead of
older IPF (sys/dist/ipf).

This adds ipf_rb.h
 1.24 25-Jun-2012  christos branches: 1.24.2;
rename rfc6056 -> portalgo, requested by yamt
 1.23 15-Feb-2012  riz Back out the recent import of IPFilter 5.1.1 for the upcoming branch,
which will now have IPFilter 4.1.34. IPFilter 5.1.1 will be restored
post-branch.

ok: core, releng.
 1.22 30-Jan-2012  darrenr Patch to include ipf_rb.h missed from merge.
 1.21 24-Sep-2011  christos branches: 1.21.2; 1.21.6;
install the header.
 1.20 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.19 05-Oct-2007  dyoung branches: 1.19.14; 1.19.44; 1.19.50;
Work in progress: use a raw socket for GRE in IP encapsulation
instead of adding/subtracting our own IPv4 header.

There are many benefits: gre(4) needn't grok the outer encapsulation
header any longer, so this simplifies the gre(4) code. The IP
stack needn't grok GRE, so it is simplified, too. gre(4) will
benefit from optimizations in the socket code. Eventually, gre(4)
will gain an IPv6 encapsulation with very few new lines of code.

There is a small performance loss. A 133 MHz, 486-class AMD Elan
sinks/sources a TCP stream over GRE with about 93% the throughput
of the old code. TCP throughput on a 266 MHz, 586-class AMD Geode
is about 96% the throughput of the old code. A 175-MHz ADM5120
(MIPS) only sinks a TCP stream over GRE at about 90% of the old
code; I am still investigating that.

I produced stripped-down versions of sosend() and soreceive() for
gre(4) to use. They are guaranteed not to block, so they can be
called from a software interrupt and from a socket upcall,
respectively.

A kernel thread is no longer necessary for socket transmit/receive,
but I didn't get around to removing it, yet.

Thanks to Matt Thomas for suggesting the use of stripped-down socket
code and software interrupts, and to Andrew Doran for advice and
answers concerning software interrupts, threads, and performance.
 1.18 02-May-2007  dyoung branches: 1.18.6; 1.18.8; 1.18.10;
Remove obsolete files netinet/in_route.[ch].
 1.17 09-Dec-2006  dyoung branches: 1.17.2; 1.17.6; 1.17.8;
Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.16 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.15 18-May-2006  liamjfoy branches: 1.15.8; 1.15.10;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.14 11-Dec-2005  christos branches: 1.14.4; 1.14.6; 1.14.8; 1.14.12;
merge ktrace-lwp.
 1.13 09-Jul-2005  xtraeme Move ipl.h into the ipfilter block, which is the right place.
 1.12 01-May-2005  martti branches: 1.12.2;
Install netinet/ipl.h (bin/30095)
 1.11 22-Feb-2005  peter branches: 1.11.2;
Add MKIPFILTER; if set to no, don't build and install the ipf(4) programs,
headers and LKM.

Add MKPF; if set to no, don't build and install the pf(4) programs,
headers, LKM and spamd.

Both options default to yes, so nothing changed in the default build.

Reviewed by lukem.
 1.10 05-Oct-2004  yamt branches: 1.10.4; 1.10.6;
move ipf headers and add a comment.
 1.9 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.8 04-Sep-2004  manu IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
 1.7 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.6 28-Mar-2004  martti branches: 1.6.2;
Sync with official IPFilter
 1.5 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.4 26-Nov-2002  lukem branches: 1.4.6;
Remove KDIR=, since SYS_INCLUDE=symlinks and KDIR are not supported any more.
 1.3 19-Apr-2000  itojun branches: 1.3.6;
add net/if_stf.h and netinet/ip_encap.h (almost noone will include them though)
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 12-Jun-1998  cgd branches: 1.1.10; 1.1.12;
Rework the way kernel include files are installed. In the new method,
as with user-land programs, include files are installed by each directory
in the tree that has includes to install. (This allows more flexibility
as to what gets installed, makes 'partial installs' easier, and gives us
more options as to which machines' includes get installed at any given
time.) The old SYS_INCLUDES={symlinks,copies} behaviours are _both_
still supported, though at least one bug in the 'symlinks' case is
fixed by this change. Include files can't be build before installation,
so directories that have includes as targets (e.g. dev/pci) have to move
those targets into a different Makefile.
 1.1.12.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.12.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.1.10.1 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.3.6.1 11-Dec-2002  thorpej Sync with HEAD.
 1.4.6.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.4.6.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.4.6.4 19-Oct-2004  skrll Sync with HEAD
 1.4.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.4.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.4.6.1 03-Aug-2004  skrll Sync with HEAD
 1.6.2.1 13-Aug-2004  jmc branches: 1.6.2.1.2;
Pullup rev 1.7 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.6.2.1.2.2 13-May-2005  riz Pull up revision 1.12 via patch (requested by martti in ticket #1495):
Install netinet/ipl.h (bin/30095)
 1.6.2.1.2.1 06-Feb-2005  jmc Pull up patch (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.10.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.10.4.1 29-Apr-2005  kent sync with -current
 1.11.2.2 24-Jul-2005  tron Pull up revision 1.13 (requested by peter in ticket #612):
Move ipl.h into the ipfilter block, which is the right place.
 1.11.2.1 01-May-2005  tron Pull up revision 1.12 (requested by martti in ticket #231):
Install netinet/ipl.h (bin/30095)
 1.12.2.4 27-Oct-2007  yamt sync with head.
 1.12.2.3 03-Sep-2007  yamt sync with head.
 1.12.2.2 30-Dec-2006  yamt sync with head.
 1.12.2.1 21-Jun-2006  yamt sync with head.
 1.14.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.14.8.1 24-May-2006  yamt sync with head.
 1.14.6.1 01-Jun-2006  kardel Sync with head.
 1.14.4.2 09-Sep-2006  rpaulo sync with head
 1.14.4.1 02-Feb-2006  rpaulo in_pcb_hdr.h is gone.
 1.15.10.1 10-Dec-2006  yamt sync with head.
 1.15.8.2 12-Jan-2007  ad Sync with head.
 1.15.8.1 18-Nov-2006  ad Sync with head.
 1.17.8.1 11-Jul-2007  mjf Sync with head.
 1.17.6.2 09-Oct-2007  ad Sync with head.
 1.17.6.1 08-Jun-2007  ad Sync with head.
 1.17.2.1 07-May-2007  yamt sync with head.
 1.18.10.1 06-Oct-2007  yamt sync with head.
 1.18.8.1 06-Nov-2007  matt sync with HEAD
 1.18.6.1 07-Oct-2007  joerg Sync with HEAD.
 1.19.50.1 06-Jun-2011  jruoho Sync with HEAD.
 1.19.44.1 31-May-2011  rmind sync with head
 1.19.14.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.21.6.1 18-Feb-2012  mrg merge to -current.
 1.21.2.1 30-Oct-2012  yamt sync with head
 1.24.2.2 03-Dec-2017  jdolecek update from HEAD
 1.24.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.25.14.3 28-Aug-2017  skrll Sync with HEAD
 1.25.14.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.25.14.1 06-Apr-2015  skrll Sync with HEAD
 1.27.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.27.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.28.14.1 10-Jun-2019  christos Sync with HEAD
 1.28.12.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.28.12.1 28-Jul-2018  pgoyette Sync with HEAD
 1.3 01-Jan-2022  andvar s/creting/creating/
 1.2 10-Aug-2008  tls branches: 1.2.2; 1.2.4; 1.2.6; 1.2.18;
Change copyright statement to NetBSD 2-clause with correct attribution.
 1.1 04-Aug-2008  tls Add accept filters, ported from FreeBSD by Coyote Point Systems. Add inetd
support for specifying an accept filter for a service (mostly as a usage
example, but it can be handy for other things). Manual pages to follow
in a day or so.

OK core@.
 1.2.18.2 04-May-2009  yamt sync with head.
 1.2.18.1 10-Aug-2008  yamt file accept_filter.h was added on branch yamt-nfs-mp on 2009-05-04 08:14:17 +0000
 1.2.6.2 19-Oct-2008  haad Sync with HEAD.
 1.2.6.1 10-Aug-2008  haad file accept_filter.h was added on branch haad-dm on 2008-10-19 22:17:46 +0000
 1.2.4.2 28-Sep-2008  mjf Sync with HEAD.
 1.2.4.1 10-Aug-2008  mjf file accept_filter.h was added on branch mjf-devfs2 on 2008-09-28 10:40:57 +0000
 1.2.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.2.2.1 10-Aug-2008  wrstuden file accept_filter.h was added on branch wrstuden-revivesa on 2008-09-18 04:37:00 +0000
 1.8 07-Jul-2016  msaitoh KNF. Remove extra spaces. No functional change.
 1.7 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.6 02-Sep-2009  tls branches: 1.6.22; 1.6.40;
Add a direction argument to socket upcalls, so they can tell why they've
been called when, for example, they're waiting for space to write. From
Ritesh Agrawal at Coyote Point.
 1.5 20-Nov-2008  ad branches: 1.5.6;
Oops, make these build.
 1.4 20-Nov-2008  ad Rename the accept filter modules to make module name match filter name.
 1.3 12-Nov-2008  ad Remove LKMs and switch to the module framework, pass 1.

Proposed on tech-kern@.
 1.2 14-Oct-2008  ad branches: 1.2.2; 1.2.4;
Use designated initializers for struct accept_filter.
 1.1 04-Aug-2008  tls branches: 1.1.2; 1.1.4;
Add accept filters, ported from FreeBSD by Coyote Point Systems. Add inetd
support for specifying an accept filter for a service (mostly as a usage
example, but it can be handy for other things). Manual pages to follow
in a day or so.

OK core@.
 1.1.4.3 17-Jan-2009  mjf Sync with HEAD.
 1.1.4.2 28-Sep-2008  mjf Sync with HEAD.
 1.1.4.1 04-Aug-2008  mjf file accf_data.c was added on branch mjf-devfs2 on 2008-09-28 10:40:57 +0000
 1.1.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.1.2.1 04-Aug-2008  wrstuden file accf_data.c was added on branch wrstuden-revivesa on 2008-09-18 04:37:00 +0000
 1.2.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.2.2.3 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.2.2.2 19-Oct-2008  haad Sync with HEAD.
 1.2.2.1 14-Oct-2008  haad file accf_data.c was added on branch haad-dm on 2008-10-19 22:17:46 +0000
 1.5.6.3 16-Sep-2009  yamt sync with head
 1.5.6.2 04-May-2009  yamt sync with head.
 1.5.6.1 20-Nov-2008  yamt file accf_data.c was added on branch yamt-nfs-mp on 2009-05-04 08:14:17 +0000
 1.6.40.2 09-Jul-2016  skrll Sync with HEAD
 1.6.40.1 22-Sep-2015  skrll Sync with HEAD
 1.6.22.1 03-Dec-2017  jdolecek update from HEAD
 1.10 16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.9 20-Aug-2015  christos branches: 1.9.18;
include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.8 25-Feb-2014  pooka branches: 1.8.6;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.7 02-Sep-2009  tls branches: 1.7.12; 1.7.22; 1.7.26;
Add a direction argument to socket upcalls, so they can tell why they've
been called when, for example, they're waiting for space to write. From
Ritesh Agrawal at Coyote Point.
 1.6 21-Nov-2008  joerg branches: 1.6.6;
Fix indentation.
 1.5 20-Nov-2008  ad Oops, make these build.
 1.4 20-Nov-2008  ad Rename the accept filter modules to make module name match filter name.
 1.3 12-Nov-2008  ad Remove LKMs and switch to the module framework, pass 1.

Proposed on tech-kern@.
 1.2 14-Oct-2008  ad branches: 1.2.2; 1.2.4;
Use designated initializers for struct accept_filter.
 1.1 04-Aug-2008  tls branches: 1.1.2; 1.1.4;
Add accept filters, ported from FreeBSD by Coyote Point Systems. Add inetd
support for specifying an accept filter for a service (mostly as a usage
example, but it can be handy for other things). Manual pages to follow
in a day or so.

OK core@.
 1.1.4.3 17-Jan-2009  mjf Sync with HEAD.
 1.1.4.2 28-Sep-2008  mjf Sync with HEAD.
 1.1.4.1 04-Aug-2008  mjf file accf_http.c was added on branch mjf-devfs2 on 2008-09-28 10:40:57 +0000
 1.1.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.1.2.1 04-Aug-2008  wrstuden file accf_http.c was added on branch wrstuden-revivesa on 2008-09-18 04:37:00 +0000
 1.2.4.1 19-Jan-2009  skrll Sync with HEAD.
 1.2.2.3 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.2.2.2 19-Oct-2008  haad Sync with HEAD.
 1.2.2.1 14-Oct-2008  haad file accf_http.c was added on branch haad-dm on 2008-10-19 22:17:46 +0000
 1.6.6.3 16-Sep-2009  yamt sync with head
 1.6.6.2 04-May-2009  yamt sync with head.
 1.6.6.1 21-Nov-2008  yamt file accf_http.c was added on branch yamt-nfs-mp on 2009-05-04 08:14:17 +0000
 1.7.26.1 18-May-2014  rmind sync with head
 1.7.22.2 03-Dec-2017  jdolecek update from HEAD
 1.7.22.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.12.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.6.1 22-Sep-2015  skrll Sync with HEAD
 1.9.18.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.2 28-Aug-2018  rin No need to update mlen also in the case of (mlen & 16) != 0.
 1.1 25-Jan-2008  joerg branches: 1.1.2; 1.1.4; 1.1.12; 1.1.102; 1.1.104;
Refactor in_cksum/in4_cksum/in6_cksum implementations:
- All three functions are included in the kernel by default.
They call a backend function cpu_in_cksum after possibly
computing the checksum of the pseudo header.
- cpu_in_cksum is the core to implement the one-complement sum.
The default implementation is moderate fast on most platforms
and provides a 32bit accumulator with 16bit addends for L32 platforms
and a 64bit accumulator with 32bit addends for L64 platforms.
It handles edge cases like very large mbuf chains (could happen with
native IPv6 in the future) and provides a good base for new native
implementations.
- Modify i386 and amd64 assembly to use the new interface.

This disables the MD implementations on !x86 until the conversion is
done. For Alpha, the portable version is faster.
 1.1.104.1 10-Jun-2019  christos Sync with HEAD
 1.1.102.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.1.12.2 23-Mar-2008  matt sync with HEAD
 1.1.12.1 25-Jan-2008  matt file cpu_in_cksum.c was added on branch matt-armv6 on 2008-03-23 02:05:06 +0000
 1.1.4.2 18-Feb-2008  mjf Sync with HEAD.
 1.1.4.1 25-Jan-2008  mjf file cpu_in_cksum.c was added on branch mjf-devfs on 2008-02-18 21:07:08 +0000
 1.1.2.2 04-Feb-2008  yamt sync with head.
 1.1.2.1 25-Jan-2008  yamt file cpu_in_cksum.c was added on branch yamt-lazymbuf on 2008-02-04 09:24:39 +0000
 1.1 10-Feb-2015  rjs branches: 1.1.2; 1.1.18;
Add DCCP protocol support from KAME.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 10-Feb-2015  jdolecek file dccp.h was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.1.2.2 06-Apr-2015  skrll Sync with HEAD
 1.1.2.1 10-Feb-2015  skrll file dccp.h was added on branch nick-nhusb on 2015-04-06 15:18:22 +0000
 1.3 26-Apr-2016  ozaki-r branches: 1.3.16;
Sweep unnecessary route.h inclusions
 1.2 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.1 10-Feb-2015  rjs branches: 1.1.2;
Add DCCP protocol support from KAME.
 1.1.2.4 29-May-2016  skrll Sync with HEAD
 1.1.2.3 22-Sep-2015  skrll Sync with HEAD
 1.1.2.2 06-Apr-2015  skrll Sync with HEAD
 1.1.2.1 10-Feb-2015  skrll file dccp_cc_sw.c was added on branch nick-nhusb on 2015-04-06 15:18:22 +0000
 1.3.16.2 03-Dec-2017  jdolecek update from HEAD
 1.3.16.1 26-Apr-2016  jdolecek file dccp_cc_sw.c was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.1 10-Feb-2015  rjs branches: 1.1.2; 1.1.18;
Add DCCP protocol support from KAME.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 10-Feb-2015  jdolecek file dccp_cc_sw.h was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.1.2.2 06-Apr-2015  skrll Sync with HEAD
 1.1.2.1 10-Feb-2015  skrll file dccp_cc_sw.h was added on branch nick-nhusb on 2015-04-06 15:18:22 +0000
 1.5 22-May-2022  andvar fix various small typos, mainly in comments.
 1.4 04-Jun-2019  msaitoh Fix typo (s/recevie/receive/).
 1.3 26-Apr-2016  ozaki-r branches: 1.3.16; 1.3.20;
Sweep unnecessary route.h inclusions
 1.2 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.1 10-Feb-2015  rjs branches: 1.1.2;
Add DCCP protocol support from KAME.
 1.1.2.4 29-May-2016  skrll Sync with HEAD
 1.1.2.3 22-Sep-2015  skrll Sync with HEAD
 1.1.2.2 06-Apr-2015  skrll Sync with HEAD
 1.1.2.1 10-Feb-2015  skrll file dccp_tcplike.c was added on branch nick-nhusb on 2015-04-06 15:18:22 +0000
 1.3.20.1 10-Jun-2019  christos Sync with HEAD
 1.3.16.2 03-Dec-2017  jdolecek update from HEAD
 1.3.16.1 26-Apr-2016  jdolecek file dccp_tcplike.c was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.3 04-Jun-2019  msaitoh Fix typo (s/recevie/receive/).
 1.2 07-Jul-2016  msaitoh branches: 1.2.16; 1.2.20;
KNF. Remove extra spaces. No functional change.
 1.1 10-Feb-2015  rjs branches: 1.1.2;
Add DCCP protocol support from KAME.
 1.1.2.3 09-Jul-2016  skrll Sync with HEAD
 1.1.2.2 06-Apr-2015  skrll Sync with HEAD
 1.1.2.1 10-Feb-2015  skrll file dccp_tcplike.h was added on branch nick-nhusb on 2015-04-06 15:18:22 +0000
 1.2.20.1 10-Jun-2019  christos Sync with HEAD
 1.2.16.2 03-Dec-2017  jdolecek update from HEAD
 1.2.16.1 07-Jul-2016  jdolecek file dccp_tcplike.h was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.12 12-May-2024  msaitoh s/unitialized/uninitialized/
 1.11 14-Aug-2023  mrg avoid uninitialised variable use.

found by GCC 12.
 1.10 07-Aug-2023  mrg fix indentation issues.

found by GCC 12.
 1.9 10-Dec-2021  andvar branches: 1.9.4;
s/occured/occurred/ in comments, log messages and man pages.
 1.8 05-Dec-2021  msaitoh s/receieve/receive/
 1.7 07-Sep-2021  andvar s/aquire/acquire/ in comments, also one typo fix acqure->acquire.
 1.6 27-Dec-2019  msaitoh s/inital/initial/
 1.5 04-Jun-2019  msaitoh Fix typo (s/recevie/receive/).
 1.4 07-Jul-2016  msaitoh branches: 1.4.16; 1.4.20;
KNF. Remove extra spaces. No functional change.
 1.3 26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.2 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.1 10-Feb-2015  rjs branches: 1.1.2;
Add DCCP protocol support from KAME.
 1.1.2.5 09-Jul-2016  skrll Sync with HEAD
 1.1.2.4 29-May-2016  skrll Sync with HEAD
 1.1.2.3 22-Sep-2015  skrll Sync with HEAD
 1.1.2.2 06-Apr-2015  skrll Sync with HEAD
 1.1.2.1 10-Feb-2015  skrll file dccp_tfrc.c was added on branch nick-nhusb on 2015-04-06 15:18:22 +0000
 1.4.20.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.4.20.1 10-Jun-2019  christos Sync with HEAD
 1.4.16.2 03-Dec-2017  jdolecek update from HEAD
 1.4.16.1 07-Jul-2016  jdolecek file dccp_tfrc.c was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.9.4.1 13-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #862):

sys/netinet/dccp_tfrc.c: revision 1.11

avoid uninitialised variable use.
found by GCC 12.
 1.3 04-Jun-2019  msaitoh Fix typo (s/recevie/receive/).
 1.2 07-Jul-2016  msaitoh branches: 1.2.16; 1.2.20;
KNF. Remove extra spaces. No functional change.
 1.1 10-Feb-2015  rjs branches: 1.1.2;
Add DCCP protocol support from KAME.
 1.1.2.3 09-Jul-2016  skrll Sync with HEAD
 1.1.2.2 06-Apr-2015  skrll Sync with HEAD
 1.1.2.1 10-Feb-2015  skrll file dccp_tfrc.h was added on branch nick-nhusb on 2015-04-06 15:18:22 +0000
 1.2.20.1 10-Jun-2019  christos Sync with HEAD
 1.2.16.2 03-Dec-2017  jdolecek update from HEAD
 1.2.16.1 07-Jul-2016  jdolecek file dccp_tfrc.h was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.1 10-Feb-2015  rjs branches: 1.1.2; 1.1.18;
Add DCCP protocol support from KAME.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 10-Feb-2015  jdolecek file dccp_tfrc_lookup.h was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.1.2.2 06-Apr-2015  skrll Sync with HEAD
 1.1.2.1 10-Feb-2015  skrll file dccp_tfrc_lookup.h was added on branch nick-nhusb on 2015-04-06 15:18:22 +0000
 1.27 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.26 04-Nov-2022  ozaki-r branches: 1.26.8;
inpcb: rename functions to in6pcb_*
 1.25 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.24 28-Oct-2022  ozaki-r Adjust dccp and sctp for struct inpcb separation
 1.23 28-Oct-2022  ozaki-r Adjust pf, wg, dccp and sctp for struct inpcb integration
 1.22 04-Dec-2021  andvar fix typos in comments and log messages, mainly in establish(ed).
 1.21 16-Dec-2018  christos sbspace() does not return negative values anymore and that broke OOB data
sending. Instead of depending on negative values, account for the 1024
bytes sosend() adds so that it can use all the space here in a separate
function sbspace_oob(). Idea from mlelstv@
 1.20 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.19 18-May-2018  maxv branches: 1.19.2;
IP6_EXTHDR_GET -> M_REGION_GET, no functional change.
 1.18 03-May-2018  maxv Remove m_copy completely.
 1.17 08-Feb-2018  dholland branches: 1.17.2;
Typos.
 1.16 07-May-2017  rjs branches: 1.16.8;
Change bzero -> memset, bcopy -> memcpy.
 1.15 07-May-2017  rjs Change SPL around call to in_pcbbind().
 1.14 07-May-2017  rjs Remove some foreign conditional code, NFC intended.
 1.13 03-Mar-2017  ozaki-r branches: 1.13.4;
Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.12 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.11 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.10 13-Dec-2016  ozaki-r branches: 1.10.2;
Remove unnecessary inclusions of nd6.h
 1.9 07-Jul-2016  msaitoh branches: 1.9.2;
KNF. Remove extra spaces. No functional change.
 1.8 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.7 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.6 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.5 27-Apr-2015  ozaki-r Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp

It consolidates a scattered routine:
(rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
 1.4 26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.3 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.2 04-Apr-2015  rtr branches: 1.2.2;
* update dccp_bind for struct mbuf * to struct sockaddr * parameter change
* pass NULL instead of casting 0 to a pointer when calling in_pcbbind()
 1.1 10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.2.2.7 28-Aug-2017  skrll Sync with HEAD
 1.2.2.6 05-Feb-2017  skrll Sync with HEAD
 1.2.2.5 09-Jul-2016  skrll Sync with HEAD
 1.2.2.4 22-Sep-2015  skrll Sync with HEAD
 1.2.2.3 06-Jun-2015  skrll Sync with HEAD
 1.2.2.2 06-Apr-2015  skrll Sync with HEAD
 1.2.2.1 04-Apr-2015  skrll file dccp_usrreq.c was added on branch nick-nhusb on 2015-04-06 15:18:22 +0000
 1.9.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.9.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.10.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.13.4.1 11-May-2017  pgoyette Sync with HEAD
 1.16.8.2 03-Dec-2017  jdolecek update from HEAD
 1.16.8.1 07-May-2017  jdolecek file dccp_usrreq.c was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.17.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.17.2.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.17.2.1 21-May-2018  pgoyette Sync with HEAD
 1.19.2.1 10-Jun-2019  christos Sync with HEAD
 1.26.8.1 02-Aug-2025  perseant Sync with HEAD
 1.7 28-Oct-2022  ozaki-r Adjust pf, wg, dccp and sctp for struct inpcb integration
 1.6 07-Nov-2021  andvar fix various typos, mainly s/prefered/preferred/
 1.5 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.4 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.3 07-Jul-2016  msaitoh branches: 1.3.16; 1.3.18; 1.3.20;
KNF. Remove extra spaces. No functional change.
 1.2 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.1 10-Feb-2015  rjs branches: 1.1.2;
Add DCCP protocol support from KAME.
 1.1.2.4 09-Jul-2016  skrll Sync with HEAD
 1.1.2.3 06-Jun-2015  skrll Sync with HEAD
 1.1.2.2 06-Apr-2015  skrll Sync with HEAD
 1.1.2.1 10-Feb-2015  skrll file dccp_var.h was added on branch nick-nhusb on 2015-04-06 15:18:22 +0000
 1.3.20.1 10-Jun-2019  christos Sync with HEAD
 1.3.18.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.3.18.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.3.16.2 03-Dec-2017  jdolecek update from HEAD
 1.3.16.1 07-Jul-2016  jdolecek file dccp_var.h was added on branch tls-maxphys on 2017-12-03 11:39:03 +0000
 1.71 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.70 06-Sep-2004  darrenr Do not allow packets flagged with "out-of-window" (oow) to match "keep state"
rules and try to prevent such rules ("keep state with oow") from being loaded
into the kernel.

Pr: kern/26581
 1.69 03-Sep-2004  smb Don't try and add a state session if the packet has already been checked
and marked as out of window - trying to do the add will result in a failure
and the packet being blocked, incorrectly.

Committed By: darrenr
Tested By: smb
 1.68 22-Aug-2004  chs fix m_pulldown() usage, it's different from m_pullup().
fixes PRs 26666 and 26701.
 1.67 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.66 16-Jun-2004  tron Correct two errors in fr_check():
1.) Make sure that "pass" is always initialized.
2.) Make sure the code doesn't use a stale mbuf pointer after fr_makefrip()
has been called. This fixes PR kern/25868.

Analyzed and reviewed by Steve Woodford.
 1.65 20-May-2004  christos PR/25622: IPV6 return RST and through cloned interfaces was broken.
- checksum was computed incorrectly.
- ipv6 packet was not initialized properly.
- fixed code to be more similar to the v4 counterpart.
 1.64 10-May-2004  christos PR/24969: Arto Selonen: /usr/sbin/ipfs from ipfilter 4.1.1 does not work
patch applied.
 1.63 04-May-2004  skd Fix to update all references to mbuf. Fixes case where mbuf is freed twice.
 1.62 01-Apr-2004  martin A few more ioctl vs. copyin changes, spotted by Bill Studenmund.
 1.61 28-Mar-2004  martti branches: 1.61.2;
Sync with official IPFilter
 1.60 28-Mar-2004  martti Upgraded IPFilter to 4.1.1
 1.59 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.58 19-Sep-2002  martti branches: 1.58.6;
Resync with official IPF
 1.57 19-Sep-2002  martti Upgraded IPFilter to 3.4.29
 1.56 09-Jun-2002  itojun whitespace
 1.55 02-May-2002  martti branches: 1.55.2; 1.55.4;
Fix compilation problems
 1.54 02-May-2002  martti Upgraded IPFilter to 3.4.27
 1.53 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.52 24-Jan-2002  martti Re-sync with IPFilter
 1.51 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.50 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.49 13-Nov-2001  lukem add RCSIDs
 1.48 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.47 02-Jun-2001  thorpej branches: 1.47.2; 1.47.6;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.46 26-Mar-2001  mike Resolve conflicts.
 1.45 05-Feb-2001  chs branches: 1.45.2;
expose the definitions of MIN() and MAX() in sys/param.h to the kernel
and use those in favor of a dozen copies scattered around the source tree.
 1.44 18-Jan-2001  jdolecek constify
 1.43 12-Nov-2000  thorpej Due to a quirk (err, bug?) in IP Filter (mbuf freed without setting *mp
to NULL), the NULL check is insufficient. Also make sure fr_check()
returned 0.
 1.42 12-Nov-2000  thorpej Oops, the mbuf may have been freed -- do a NULL check in the wrapper.
 1.41 11-Nov-2000  thorpej Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.
 1.40 08-Oct-2000  itojun ipfilter currently supports IPv4 only. do not try to touch non-IPv4
packets. PR 11082.

This is a short-term workaround. whenever new ipfilter comes out with
proper non-IPv4 support, we should migrate to the new ipfilter.
 1.39 12-Aug-2000  enami Put # endif directive after the right (i.e., matching) close brace
to prevent compilation error.
 1.38 12-Aug-2000  veego Protect a IPLLOG with ifdef IPFILTER_LOG. Patch from Darren Reed.
 1.37 09-Aug-2000  veego Resolve conflicts.
 1.36 12-Jun-2000  veego branches: 1.36.2;
Resolve conflicts.
 1.35 23-May-2000  veego branches: 1.35.2;
Resolve conflicts.
 1.34 21-May-2000  veego Resolve conflicts.
 1.33 11-May-2000  veego Resolve conflicts and fix a compile error in ip_ftp_pxy.c.
 1.32 10-May-2000  itojun correct out-of-bound access when hlen == 1 and opt > 1.
reviewed by darren, darren committed to freebsd fil.c (1.12 -> 1.13)
so it should be correct enough.
 1.31 03-May-2000  veego Resolve conflicts.
 1.30 30-Mar-2000  augustss Remove register declarations.
 1.29 01-Feb-2000  veego Resolve conflicts.
 1.28 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.27 02-Feb-1999  cjs branches: 1.27.2; 1.27.6; 1.27.8; 1.27.14;
Remove SCCS markers and make these compile in $NetBSD$ IDs.
 1.26 23-Jan-1999  mycroft Fix problems with fr_tcpsum() that prevented the FTP proxy from working.
 1.25 26-Nov-1998  mrg add a patch from darren reed, to make ipfilter use our cksum routine.
 1.24 22-Nov-1998  mrg merge ipf 3.2.10
 1.23 12-Jul-1998  veego Resolve conflicts from the import.
 1.22 31-May-1998  cgd Another demonstration that when you're converting variables from 'long's
to fixed 32-bit integers, you have to exercise care.
 1.21 29-May-1998  veego Resolve conflicts from the import of IPFilter 3.2.7.
 1.20 17-May-1998  veego Resolve conflicts
 1.19 01-May-1998  thorpej If packets are passed through IP Filter at all, don't allow fast-forward
flow entries to be created for them.

Eventually, IP Filter should be extended to allow IP src/dst pairs to
be specified as "fast forward OK".
 1.18 17-Nov-1997  mrg fix checksum problems (from marc boucher via darren reed).
 1.17 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.16 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.15 21-Sep-1997  veego branches: 1.15.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.14 08-Jul-1997  mrg branches: 1.14.2;
put back IPFILTER_DEFAULT_BLOCK, as documented in options(4).
 1.13 06-Jul-1997  thorpej Restore original RCS IDs.
 1.12 06-Jul-1997  thorpej Fix a bug caught by gcc: add parenthesis to properly group a test.
 1.11 05-Jul-1997  darrenr fix conflicts from import
 1.10 16-Jun-1997  mrg make it "options IPFILTER_DEFAULT_BLOCK".
 1.9 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.8 27-May-1997  thorpej Make this compile on 32-bit architectures again:
- garbage-collect unused variables, or #ifdef them as appropriate.
 1.7 25-May-1997  darrenr fix conflicts
 1.6 29-Mar-1997  darrenr use IPLLOG instead of ipllog to easily mask parameters, fix up prototype
problems for compiling to user programs.
 1.5 29-Mar-1997  thorpej Resolve conflicts from merge.

XXX !!! XXX !!!
I noticed a few semi-serious bugs while doing this merge, one of which
has existed for a fairly long time. Some of them are addressed in this
commit (because they caused the kernel to not compile), and are annoted
by "XXX" and "--thorpej". The other one will be addressed shortly in
a future commit, and, as far as I can tell, affects all operating systems
which IP Filter supports.
 1.4 19-Feb-1997  scottr Don't include ipfilter.h if building an LKM.
 1.3 18-Feb-1997  mrg pseudo-device ipfilter brings in PFIL_HOOKS.
 1.2 05-Jan-1997  veego branches: 1.2.4;
Add $NetBSD$ id's and restore the orginal Id's.
 1.1 05-Jan-1997  mrg branches: 1.1.1;
initial import of darren reed's ip-filter, version 3.1.2.
 1.1.1.27 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.26 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.25 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.24 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.23 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.22 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.21 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.20 09-Aug-2000  veego Import IP Filter 3.4.9
 1.1.1.19 12-Jun-2000  veego Import IP Filter 3.4.6
 1.1.1.18 23-May-2000  veego Import IP Filter 3.4.4
 1.1.1.17 21-May-2000  veego Import IP Filter 3.4.3
 1.1.1.16 11-May-2000  veego Import IP Filter 3.4.2
 1.1.1.15 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.14 01-Feb-2000  veego Import IP Filter 3.3.8
 1.1.1.13 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.12 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.11 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.10 29-May-1998  veego Import IP Filter 3.2.7
 1.1.1.9 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.8 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.7 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.6 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.5 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.4 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.3 25-May-1997  darrenr Import version 3.2alpha7
 1.1.1.2 27-Mar-1997  darrenr Bring in entire 3.2alpha2 source tree
 1.1.1.1 27-Mar-1997  darrenr Update to version 3.2alpha2
 1.2.4.1 12-Mar-1997  is Merge in changes from Trunk
 1.14.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.15.2.6 29-Nov-1998  cgd pull up rev 1.25 from trunk (mrg)
 1.15.2.5 24-Nov-1998  cgd pull up rev(s) 1.24 from trunk (ipfilter 3.2.10). (mrg)
 1.15.2.4 22-Sep-1998  cgd Redo previous change (Pull up 1.20-1.23 (veego)). Previous commit actually
pulled up 1.21-1.23, and incorrectly omitted the changes in rev 1.20.
 1.15.2.3 17-Sep-1998  mellon Pull up 1.20-1.23 (veego)
 1.15.2.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.15.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.27.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.27.8.4 27-Mar-2001  bouyer Sync with HEAD.
 1.27.8.3 11-Feb-2001  bouyer Sync with HEAD.
 1.27.8.2 22-Nov-2000  bouyer Sync with HEAD.
 1.27.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.27.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.27.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.27.2.2 13-May-2000  he Pull up revision 1.32 (requested by darrenr):
Fix bug in dealing with large offsets inside very small options.
 1.27.2.1 20-Dec-1999  he Pull up revision 1.28 (requested by darrenr):
Update IPF to version 3.3.5.
 1.35.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.36.2.5 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.36.2.4 09-Feb-2002  he Pull up revisions 1.41-1.52 (requested by martti):
Updated IPFilter to 3.4.23.
 1.36.2.3 08-Oct-2000  itojun pullup 1.39 -> 1.40 (approved by releng-1-5)

ipfilter currently supports IPv4 only. do not try to touch non-IPv4
packets. PR 11082.

This is a short-term workaround. whenever new ipfilter comes out with
proper non-IPv4 support, we should migrate to the new ipfilter.
 1.36.2.2 31-Aug-2000  veego Pull up revisions 1.38-1.39 (requested by veego). Approved by releng-1-5.

>syssrc/sys/netinet/fil.c 1.38
>Committed By: veego
>Log Message:
>Protect a IPLLOG with ifdef IPFILTER_LOG. Patch from Darren Reed.

>syssrc/sys/netinet/fil.c 1.39
>Committed By: enami
>Log Message:
>Put # endif directive after the right (i.e., matching) close brace
>to prevent compilation error.
 1.36.2.1 31-Aug-2000  veego Pull up ipf 3.4.9 (requested by veego). approved by releng-1-5.

basesrc/dist/ipf/HISTORY 1.8 -> 1.9
basesrc/dist/ipf/fils.c 1.9 -> 1.10
basesrc/dist/ipf/ip_sfil.c 1.5 -> 1.6
basesrc/dist/ipf/ipf.c 1.4 -> 1.5
basesrc/dist/ipf/ipmon.c 1.4 -> 1.5
basesrc/dist/ipf/ipnat.c 1.5 -> 1.6
basesrc/dist/ipf/natparse.c 1.3 -> 1.4
basesrc/dist/ipf/parse.c 1.4 -> 1.5
basesrc/dist/ipf/iplang/iplang_y.y 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.1 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.5 1.1 -> 1.2
syssrc/sys/netinet/fil.c 1.36 -> 1.37
syssrc/sys/netinet/ip_auth.c 1.17 -> 1.18
syssrc/sys/netinet/ip_fil.c 1.57 -> 1.58
syssrc/sys/netinet/ip_ftp_pxy.c 1.16 -> 1.17
syssrc/sys/netinet/ip_log.c 1.10 -> 1.11
syssrc/sys/netinet/ip_nat.c 1.34 -> 1.35
syssrc/sys/netinet/ip_nat.h 1.20 -> 1.21
syssrc/sys/netinet/ip_rcmd_pxy.c 1.4 -> 1.5
syssrc/sys/netinet/ip_state.c 1.26 -> 1.27
syssrc/sys/netinet/ip_state.h 1.16 -> 1.17
syssrc/sys/netinet/ipl.h 1.8 -> 1.9

Changes:
>3.4.9 08/08/2000 - Released
>
>implement new aging mechanism in fr_tcp_age()
>
>fix icmp state checking bug
>
>revamp buildsunos script and build both sparcv7/sparcv9 for Solaris
>if on an Ultra with a 64bit system & compiler (Caseper Dik)
>
>open ipfilter device read only if we know we can
>
>print out better information for ICMP packets in ipmon
>
>move checking for source spoofed packets to a point where we can generate
>logs of them
>
>return EFAULT from ircopyptr/iwcopyptr
>
>don't do ioctl(SIOCGETFS) for auth stats
>
>fix up freeing mbufs for post-4.3BSD
>
>fix returning of inc from ftp proxy
>
>fix bugs with ipfs -R/-W (Caseper Dik)
>
>3.4.8 19/07/2000 - Released
>
>create fake opt_inet6.h for FreeBSD-4 compile as LKM
>
>add #ifdef's for KLD_MODULE sanity
>
>NAT fastroute'd packets which come out of return-*
>
>fix upper/lower case crap in ftp proxy and get seq# checking fixed up.
>
>3.4.7 08/07/2000 - Released
>
>make "ipf -y" lookup NAT if's which are unknown
>
>prepend line numbers to ioctl error messages in ipf/ipnat
>
>don't apply patches to FreeBSD twice
>
>allow for ip_len to be on an unaligned boundary early on in fr_precheck
>
>fix printing of icmp code when it is 0
>
>correct printing of port numbers in map rules with from/to
>
>don't allow fr_func to be called at securelevel > 0 or rules to be added
>if securelevel > 0 if they have a non-zero fr_func.
 1.45.2.9 20-Sep-2002  thorpej Sync with HEAD.
 1.45.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.45.2.7 04-May-2002  thorpej Update from trunk.
 1.45.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.45.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.45.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.45.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.45.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.45.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.47.6.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.47.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.47.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.47.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.47.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.55.4.2 26-Oct-2005  riz Apply patch (requested by darrenr in ticket #1780):
Don't pass a NULL pointer to ipfr_fastroute().
From Christoph Egger in PR#26141
 1.55.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.55.2.1 20-Jun-2002  gehenna catch up with -current.
 1.58.6.6 19-Oct-2004  skrll Sync with HEAD
 1.58.6.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.58.6.4 18-Sep-2004  skrll Sync with HEAD.
 1.58.6.3 03-Sep-2004  skrll Sync with HEAD
 1.58.6.2 25-Aug-2004  skrll Sync with HEAD.
 1.58.6.1 03-Aug-2004  skrll Sync with HEAD
 1.61.2.10 10-Jan-2005  jmc Pullup patch (requested by hubertf in ticket #1068)

Pull revs 1.2-1.7 from sys/dist/ipf/netinet/fil.c from trunk to fix panics
with ipf and IPv6. Fixes PR#28875 and #26839
 1.61.2.9 12-Nov-2004  jmc branches: 1.61.2.9.2;
Pullup patch (requested by darrenr in ticket #910)

Fix for previous revision was incorrect. it checks for FI_OOW regardless of what
type of data is stored in the rule (only a valid check for FR_T_IPF rules.)
 1.61.2.8 04-Oct-2004  jmc Pullup rev 1.69-1.70+patch (requested by jdolecek in ticket #888)

Do not allow packets flagged with "out-of-window" (oow) to match "keep state"
rules and try to prevent such rules ("keep state with oow") from being loaded
into the kernel. PR#26581
 1.61.2.7 23-Aug-2004  tron Pull up revision 1.68 (requested by chs in ticket #783):
fix m_pulldown() usage, it's different from m_pullup().
fixes PRs 26666 and 26701.
 1.61.2.6 13-Aug-2004  jmc Pullup rev 1.67 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.61.2.5 18-Jun-2004  grant Pull up revision 1.66 (requested by tron in ticket #502):

Correct two errors in fr_check():
1.) Make sure that "pass" is always initialized.
2.) Make sure the code doesn't use a stale mbuf pointer after fr_makefrip()
has been called. This fixes PR kern/25868.
 1.61.2.4 30-May-2004  tron Pull up revision 1.65 (requested by christos in ticket #416):
PR/25622: IPV6 return RST and through cloned interfaces was broken.
- checksum was computed incorrectly.
- ipv6 packet was not initialized properly.
- fixed code to be more similar to the v4 counterpart.
 1.61.2.3 30-May-2004  tron Pull up revision 1.64 (requested by christos in ticket #416):
PR/24969: Arto Selonen: /usr/sbin/ipfs from ipfilter 4.1.1 does not work
patch applied.
 1.61.2.2 30-May-2004  tron Pull up revision 1.63 (requested by christos in ticket #416):
Fix to update all references to mbuf. Fixes case where mbuf is freed twice.
 1.61.2.1 02-Apr-2004  tron Pull up revision 1.62 (requested by martin in ticket #46):
A few more ioctl vs. copyin changes, spotted by Bill Studenmund.
 1.61.2.9.2.1 06-Feb-2005  jmc Pull up revision 1.71 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.16 04-Dec-2020  thorpej Build ip_sync.c with -Wno-error to avoid failing due to excessive stack
usage.
 1.15 23-Mar-2012  christos branches: 1.15.52;
move back to 5.1.1 in the new place
 1.14 15-Feb-2012  riz Back out the recent import of IPFilter 5.1.1 for the upcoming branch,
which will now have IPFilter 4.1.34. IPFilter 5.1.1 will be restored
post-branch.

ok: core, releng.
 1.13 30-Jan-2012  darrenr New files required to build ipfilter into the kernel
 1.12 02-Oct-2010  bad branches: 1.12.8; 1.12.12;
Defopt the rest of the Ipfilter options and tunables.
Per discussion with darrenr@ a year ago.
 1.11 17-Apr-2010  darrenr fix spelling mistake: netient -> netinet
 1.10 17-Apr-2010  darrenr add IPFILTER_COMPAT to kernel config options recognised for IPFilter
 1.9 24-Jan-2010  pooka branches: 1.9.2; 1.9.4;
ipfilter depends on bpf_filter, not bpfilter (since the year 2000).
 1.8 17-Sep-2006  yamt branches: 1.8.54;
defflag IPFILTER_LOOKUP.
 1.7 11-Dec-2005  christos branches: 1.7.20;
merge ktrace-lwp.
 1.6 26-Mar-2005  christos branches: 1.6.2;
defopt IPFILTER_DEFAULT_BLOCK
 1.5 01-Oct-2004  christos branches: 1.5.4;
Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.4 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.3 28-Mar-2004  martti branches: 1.3.2;
Sync with official IPFilter
 1.2 11-Oct-2002  thorpej branches: 1.2.2; 1.2.8;
Add missing "needs-flag".
 1.1 10-Oct-2002  thorpej Move netinet, netinet6, ipsec, and ipfilter config defns to
netinet/files.ipfilter, etinet/files.netinet, netinet6/files.netinet6,
and netinet6/files.netipsec.

XXX There are still a few stragglers in conf/files, which are entangled
with other network protocols.
 1.2.8.3 01-Apr-2005  skrll Sync with HEAD.
 1.2.8.2 19-Oct-2004  skrll Sync with HEAD
 1.2.8.1 03-Aug-2004  skrll Sync with HEAD
 1.2.2.2 18-Oct-2002  nathanw Catch up to -current.
 1.2.2.1 11-Oct-2002  nathanw file files.ipfilter was added on branch nathanw_sa on 2002-10-18 02:45:16 +0000
 1.3.2.1 13-Aug-2004  jmc branches: 1.3.2.1.2;
Pullup rev 1.4 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.3.2.1.2.1 06-Feb-2005  jmc Pull up patch (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.5.4.1 29-Apr-2005  kent sync with -current
 1.6.2.1 30-Dec-2006  yamt sync with head.
 1.7.20.1 18-Nov-2006  ad Sync with head.
 1.8.54.3 09-Oct-2010  yamt sync with head
 1.8.54.2 11-Aug-2010  yamt sync with head.
 1.8.54.1 11-Mar-2010  yamt sync with head
 1.9.4.2 05-Mar-2011  rmind sync with head
 1.9.4.1 30-May-2010  rmind sync with head
 1.9.2.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.9.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.12.12.2 05-Apr-2012  mrg sync to latest -current.
 1.12.12.1 18-Feb-2012  mrg merge to -current.
 1.12.8.1 17-Apr-2012  yamt sync with head
 1.15.52.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.30 20-Sep-2022  ozaki-r tcp: separate syn cache stuffs into tcp_syncache.[ch] files

No functional change.
 1.29 08-Mar-2021  christos remove now unused pseudo-random ip id code.
 1.28 29-Jul-2017  maxv branches: 1.28.16;
Remove TCP_COMPAT_42.
 1.27 13-Oct-2015  rjs Add core networking support for SCTP.
 1.26 10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.25 02-Dec-2014  christos add routines to print in_addr and sockaddr_in (in_print and sin_print)
 1.24 25-Jun-2012  christos branches: 1.24.2; 1.24.16;
rename rfc6056 -> portalgo, requested by yamt
 1.23 24-Sep-2011  christos branches: 1.23.2;
Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.22 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.21 13-Jul-2010  rmind branches: 1.21.2;
Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@
 1.20 25-Jan-2008  joerg branches: 1.20.10; 1.20.30; 1.20.32;
Refactor in_cksum/in4_cksum/in6_cksum implementations:
- All three functions are included in the kernel by default.
They call a backend function cpu_in_cksum after possibly
computing the checksum of the pseudo header.
- cpu_in_cksum is the core to implement the one-complement sum.
The default implementation is moderate fast on most platforms
and provides a 32bit accumulator with 16bit addends for L32 platforms
and a 64bit accumulator with 32bit addends for L64 platforms.
It handles edge cases like very large mbuf chains (could happen with
native IPv6 in the future) and provides a good base for new native
implementations.
- Modify i386 and amd64 assembly to use the new interface.

This disables the MD implementations on !x86 until the conversion is
done. For Alpha, the portable version is faster.
 1.19 02-May-2007  dyoung branches: 1.19.8; 1.19.14;
Remove obsolete files netinet/in_route.[ch].
 1.18 02-May-2007  dyoung Remove unused option.
 1.17 09-Dec-2006  dyoung branches: 1.17.2; 1.17.6; 1.17.8;
Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.16 25-Nov-2006  yamt move tso-by-software code to their own files. no functional changes.
 1.15 23-Nov-2006  tron Backout accidental commit which broke kernel builds.
 1.14 23-Nov-2006  rpaulo New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.13 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.12 09-Oct-2006  rpaulo Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
 1.11 11-Dec-2005  christos branches: 1.11.20; 1.11.22;
merge ktrace-lwp.
 1.10 28-Feb-2005  jonathan branches: 1.10.4;
Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over. Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting both
Kentaro's wired-Ethernet NIC and in my two (different) WiFi NICs.
 1.9 13-Jan-2005  drochner branches: 1.9.2; 1.9.4;
compile tcp_debug.c only if the TCP_DEBUG option is set,
and remove the "#ifdef TCP_DEBUG" around everything
 1.8 04-Sep-2004  manu IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
 1.7 01-May-2004  matt defflag TCP_OUTPUT_COUNTERS and TCP_REASS_COUNTERS
 1.6 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.5 26-Nov-2003  itojun always compile ip_id.c
 1.4 26-Nov-2003  itojun define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.
 1.3 17-Nov-2003  jonathan Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
 1.2 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.1 10-Oct-2002  thorpej branches: 1.1.2; 1.1.8;
Move netinet, netinet6, ipsec, and ipfilter config defns to
netinet/files.ipfilter, etinet/files.netinet, netinet6/files.netinet6,
and netinet6/files.netipsec.

XXX There are still a few stragglers in conf/files, which are entangled
with other network protocols.
 1.1.8.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.8.4 17-Jan-2005  skrll Sync with HEAD.
 1.1.8.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.8.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.8.1 03-Aug-2004  skrll Sync with HEAD
 1.1.2.2 18-Oct-2002  nathanw Catch up to -current.
 1.1.2.1 10-Oct-2002  nathanw file files.netinet was added on branch nathanw_sa on 2002-10-18 02:45:16 +0000
 1.9.4.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.9.2.1 29-Apr-2005  kent sync with -current
 1.10.4.3 04-Feb-2008  yamt sync with head.
 1.10.4.2 03-Sep-2007  yamt sync with head.
 1.10.4.1 30-Dec-2006  yamt sync with head.
 1.11.22.2 10-Dec-2006  yamt sync with head.
 1.11.22.1 22-Oct-2006  yamt sync with head
 1.11.20.2 12-Jan-2007  ad Sync with head.
 1.11.20.1 18-Nov-2006  ad Sync with head.
 1.17.8.1 11-Jul-2007  mjf Sync with head.
 1.17.6.1 08-Jun-2007  ad Sync with head.
 1.17.2.1 07-May-2007  yamt sync with head.
 1.19.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.19.8.1 23-Mar-2008  matt sync with HEAD
 1.20.32.2 31-May-2011  rmind sync with head
 1.20.32.1 05-Mar-2011  rmind sync with head
 1.20.30.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.20.10.1 11-Aug-2010  yamt sync with head.
 1.21.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.23.2.1 30-Oct-2012  yamt sync with head
 1.24.16.3 28-Aug-2017  skrll Sync with HEAD
 1.24.16.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.24.16.1 06-Apr-2015  skrll Sync with HEAD
 1.24.2.1 03-Dec-2017  jdolecek update from HEAD
 1.28.16.1 03-Apr-2021  thorpej Sync with HEAD.
 1.61 06-Dec-2024  riastradh netinet/icmp6.h: Nix trailing whitespace.

No functional change intended.
 1.60 06-Dec-2024  riastradh netinet/icmp6.h: Need sys/types.h and netinet/in.h.

- sys/types.h for u_intN_t
- netinet/in.h for struct in6_addr
 1.59 29-Aug-2022  knakahara branches: 1.59.10;
Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.
 1.58 22-Aug-2022  knakahara Add sysctl entry to enable/disable to use path MTU discovery for icmpv6 reflecting.

If we want to use path MTU discovery for icmp reflecting set
net.inet6.icmp6.reflect_pmtu=1. Default(=0) is the same as before, that is,
use IPV6_MINMTU.
 1.57 27-Jul-2020  roy icmp6: Remove __packed attribute from icmp6 structures

They should naturally align.
Add compile time assertations to icmp6.c to prove this.
 1.56 15-Jun-2020  roy icmp6.h: #define ND_RA_FLAG_PROXY

RFC 4389, experimental. Maybe someone will implement it one day.
 1.55 15-Jun-2020  roy icmp6.h: #define ND_OPT_PI_FLAG_ROUTER

We already define ND_RA_FLAG_HOME_AGENT and that kind of requires
ND_OPT_PI_FLAG_ROUTER.
 1.54 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.53 09-Mar-2020  roy route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.
 1.52 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.51 24-Apr-2018  maxv branches: 1.51.2;
Add code 3 of paramprob, part of RFC7112: "IPv6 First Fragment has
incomplete IPv6 Header Chain". Handle this code in ping6.
 1.50 06-Mar-2018  roy nd6: add a nonce to DaD probes in-case they are looped back to us

This implements RFC 7527, based a similar change in FreeBSD.
 1.49 23-Jan-2018  maxv branches: 1.49.2;
Style, localify, remove XXX when there's no issue, and switch 'extra'
to int.
 1.48 11-Dec-2016  ozaki-r Remove unnecessary forward struct declarations
 1.47 01-Jul-2013  christos branches: 1.47.8; 1.47.12;
Add MLD_LISTENER_REDUCTION per RFC 3542. Requested by Lorenzo Colitti.
 1.46 19-Jul-2012  spz branches: 1.46.2; 1.46.4;
<grmbl>whitespace</grmbl>
 1.45 19-Jul-2012  spz this commit contains two sets of unrelated changes:
"while I was here" I checked other KAME implementations for their icmp6.h
version, and thus:
- added a define for MLDV2_LISTENER_REPORT from FreeBSD
- added defines for the missing ICMP6_DST_UNREACH codes

then on to what I actually wanted to do:
- adds strings for the types and codes (encapsulated by ICMP6_STRINGS)
for the use of npfctl and other tools that might want to parse
human-friendly names instead of the corresponding number for ipv6-icmp
types and codes.
The strings are ordered such that their index is (as far as is practical)
the number belonging to the name, which is why there are
icmp6_type_err (use directly) and icmp6_type_info (add 128)
 1.44 10-Dec-2011  roy branches: 1.44.2;
Add RDNSS and DNSSL support, RFC6106.
Replace custom lists with TAILQ lists.
Clean up plently of signed vs unsigned warnings and set WARNS=4.

Adapted from FreeBSD.
 1.43 11-Nov-2011  gdt branches: 1.43.4;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.
 1.42 24-May-2011  spz branches: 1.42.4;
RA flood mitigation via a limit on accepted routes:
- introduce a limit for the routes accepted via IPv6 Router Advertisement:
a common 2 interface client will have 6, the default limit is 100 and
can be adjusted via sysctl
- report the current number of routes installed via RA via sysctl
- count discarded route additions. Note that one RA message is two routes.
This is at present only across all interfaces even though per-interface
would be more useful, since the per-interface structure complies to RFC2466
- bump kernel version due to the previous change
- adjust netstat to use the new value (with netstat -p icmp6)
 1.41 08-May-2011  spz update (unused) ND option identifiers and corresponding comments
 1.40 31-Oct-2009  christos branches: 1.40.4; 1.40.6;
add enough info to let rtadvd compile with route-info.
 1.39 11-Jul-2008  cyber Add IANA allocation and header for RFC 5006 (RA RDNSS) IPv6 Router
Advertisement option.
 1.38 15-Apr-2008  thorpej branches: 1.38.4; 1.38.6; 1.38.8; 1.38.10;
Make ip6 and icmp6 stats per-cpu.
 1.37 08-Apr-2008  thorpej Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.
 1.36 25-Dec-2007  perry branches: 1.36.2; 1.36.6;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.35 07-Mar-2006  wiz branches: 1.35.36; 1.35.42; 1.35.46; 1.35.50;
'advertisment' -> 'advertisement', from leonardo chiquitto filho
via jmc@openbsd.
 1.34 05-Mar-2006  rpaulo branches: 1.34.2;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.33 21-Jan-2006  rpaulo branches: 1.33.2; 1.33.4; 1.33.6;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.32 10-Dec-2005  elad branches: 1.32.2;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.31 29-May-2005  christos branches: 1.31.2;
remove stupid hand-rolled loop and kernel conditional.
 1.30 21-Apr-2004  itojun no space between function name and paren: foo (blah) -> foo(blah)
 1.29 18-Apr-2004  matt De __P()
 1.28 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.27 06-Jun-2003  itojun branches: 1.27.2;
separate RFC2292 decls for MLD; sync w/ kame
 1.26 06-Jun-2003  itojun - sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).
 1.25 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.24 09-Jun-2002  itojun whitespace
 1.23 08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.22 29-May-2002  itojun move per-interface ip6/icmp6 stat to ifnet->if_afdata. sync w/kame
 1.21 21-Dec-2001  itojun branches: 1.21.8;
have packed attribute to protocol structs. sync with kame
 1.20 07-Dec-2001  itojun correct timing to increment icmp6 MIB variables. sync with kame
 1.19 07-Feb-2001  itojun branches: 1.19.2; 1.19.4;
during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).
 1.18 22-Jan-2001  itojun fix RR result bit in little endian systems. sync with kame
 1.17 21-Jan-2001  itojun sync with latest kame.
- make icmp6.h spec conformant to 2292bis-02, regarding to router reumbering
flag bit.
- latest rtadvd.
 1.16 09-Dec-2000  itojun update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case
 1.15 18-Oct-2000  itojun verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
 1.14 10-Oct-2000  itojun sync with kame ($KAME$)
 1.13 03-Aug-2000  itojun clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.
 1.12 03-Aug-2000  itojun correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.
 1.11 28-Jul-2000  itojun nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit
 1.10 06-Jul-2000  itojun - do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
 1.9 12-Jun-2000  itojun branches: 1.9.2;
better conformance to draft-ietf-ipngwg-icmp-name-lookups-05.
the old code was chimera of 03 and 05 draft.

-n by default, since IPv6 reverse lookup takes too much time.
use -H to enable reverse name lookup.
 1.8 09-Mar-2000  itojun branches: 1.8.2;
change member name for icmp6_filter, to be conformant to RFC2292.
From: Francis Dupont
 1.7 28-Feb-2000  itojun support draft-ietf-ipngwg-icmp-name-lookups-05.txt, drop support for
draft-ietf-ipngwg-icmp-name-lookups-04.txt.

There are certain bitfield change in 04 draft to 05 draft, which makes
04 "ping6 -a" and 05 "ping6 -a" not interoperable. sigh.
 1.6 28-Feb-2000  itojun remove some of cross-BSD portability #ifdef.
remove xxCTL_VARS, which is BSDI specific.
 1.5 26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.4 06-Feb-2000  itojun to be more rfc2292 complient, move ip6.h and icmp6.h into netinet.
(netinet6/{ip6,icmp6}.h is non-standard path - these files should go away)

it was not possible to use cvsmove in this case.
when you try to look at history, chase it toward netinet6/{ip6,icmp6}.h.
 1.3 03-Jul-1999  thorpej branches: 1.3.2;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file icmp6.h was initially added on branch kame.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file icmp6.h was added on branch chs-ubc2 on 1999-07-01 23:47:00 +0000
 1.3.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.3.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.9.2.4 16-Aug-2000  itojun pullup (approved by releng-1-5)

switch from net.inet*.*.*ratelimit to net.inet*.*.ppslimit.

(tags are rough estimate - we had some try-and-error in main trunc)
sys/netinet/icmp6.h 1.9 -> 1.11
sys/netinet/icmp_var.h 1.15 -> 1.17
sys/netinet/in_proto.c 1.39 -> 1.42
sys/netinet/ip_icmp.c 1.50 -> 1.51, 1.52 -> 1.54
sys/netinet/tcp_input.c 1.111 -> 1.112, 1.115 -> 1.117
sys/netinet/tcp_usrreq.c 1.52 -> 1.53
sys/netinet/tcp_var.h 1.72 -> 1.75
sys/netinet6/icmp6.c 1.34 -> 1.35, 1.36 -> 1.38
sys/netinet6/in6_proto.c 1.17 -> 1.19
 1.9.2.3 04-Aug-2000  itojun pullup (approved by releng-1-5)
sys/netinet6/icmp6.h 1.11 -> 1.13
sys/netinet6/icmp6.c 1.39 -> 1.41

cvs rdiff -r1.11 -r1.12 syssrc/sys/netinet/icmp6.h
cvs rdiff -r1.39 -r1.40 syssrc/sys/netinet6/icmp6.c

correct typo in #define. ICMP6_NI_SUCESS -> SUCCESS (notice missing C).
sync with kame.

cvs rdiff -r1.12 -r1.13 syssrc/sys/netinet/icmp6.h
cvs rdiff -r1.40 -r1.41 syssrc/sys/netinet6/icmp6.c

clearifications in icmp6 node query support.
XXX previous commit included "supported qtypes" icmp6 node query support.
sorry commit message was mistaken.
 1.9.2.2 20-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)
- do not use bitfield for router renumbering header.
part of sys/netinet/icmp6.h 1.9 -> 1.10
 1.9.2.1 20-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)
- add protection mechanism against ND cache corruption due to bad NUD hints.

this is part of:
sys/netinet/icmp6.h 1.9 -> 1.10
sys/netinet/tcp_input.c 1.111 -> 1.112
sys/netinet6/icmp6.c 1.34 -> 1.35
sys/netinet6/nd6.c 1.30 -> 1.31
sys/netinet6/nd6.h 1.14 -> 1.15
 1.19.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.19.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.19.2.3 11-Nov-2002  nathanw Catch up to -current
 1.19.2.2 20-Jun-2002  nathanw Catch up to -current.
 1.19.2.1 08-Jan-2002  nathanw Catch up to -current.
 1.21.8.2 20-Jun-2002  gehenna catch up with -current.
 1.21.8.1 30-May-2002  gehenna Catch up with -current.
 1.27.2.5 11-Dec-2005  christos Sync with head.
 1.27.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.27.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.27.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.27.2.1 03-Aug-2004  skrll Sync with HEAD
 1.31.2.2 21-Jan-2008  yamt sync with head
 1.31.2.1 21-Jun-2006  yamt sync with head.
 1.32.2.1 01-Feb-2006  yamt sync with head.
 1.33.6.1 13-Mar-2006  yamt sync with head.
 1.33.4.1 22-Apr-2006  simonb Sync with head.
 1.33.2.1 09-Sep-2006  rpaulo sync with head
 1.34.2.1 19-Apr-2006  elad sync with head.
 1.35.50.1 02-Jan-2008  bouyer Sync with HEAD
 1.35.46.1 26-Dec-2007  ad Sync with head.
 1.35.42.1 18-Feb-2008  mjf Sync with HEAD.
 1.35.36.1 09-Jan-2008  matt sync with HEAD
 1.36.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.36.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.36.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.38.10.1 19-Oct-2008  haad Sync with HEAD.
 1.38.8.1 18-Jul-2008  simonb Sync with head.
 1.38.6.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.38.4.2 11-Mar-2010  yamt sync with head
 1.38.4.1 04-May-2009  yamt sync with head.
 1.40.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.40.4.1 31-May-2011  rmind sync with head
 1.42.4.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.42.4.2 30-Oct-2012  yamt sync with head
 1.42.4.1 17-Apr-2012  yamt sync with head
 1.43.4.1 18-Feb-2012  mrg merge to -current.
 1.44.2.1 25-Jul-2012  jdc Pull up revisions:
src/sys/netinet/icmp6.h revisions 1.45,1.46
(requested by rmind in ticket #434).

this commit contains two sets of unrelated changes:
"while I was here" I checked other KAME implementations for their icmp6.h
version, and thus:
- added a define for MLDV2_LISTENER_REPORT from FreeBSD
- added defines for the missing ICMP6_DST_UNREACH codes

then on to what I actually wanted to do:
- adds strings for the types and codes (encapsulated by ICMP6_STRINGS)
for the use of npfctl and other tools that might want to parse
human-friendly names instead of the corresponding number for ipv6-icmp
types and codes.
The strings are ordered such that their index is (as far as is practical)
the number belonging to the name, which is why there are
icmp6_type_err (use directly) and icmp6_type_info (add 128)

<grmbl>whitespace</grmbl>
 1.46.4.1 28-Aug-2013  rmind sync with head
 1.46.2.2 03-Dec-2017  jdolecek update from HEAD
 1.46.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.47.12.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.47.8.1 05-Feb-2017  skrll Sync with HEAD
 1.49.2.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.49.2.2 02-May-2018  pgoyette Synch with HEAD
 1.49.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.51.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.51.2.1 10-Jun-2019  christos Sync with HEAD
 1.59.10.1 02-Aug-2025  perseant Sync with HEAD
 1.5 17-Feb-2021  christos - pass the alignment instead of the mask (as Roy asked and to match the
other macro)
- use alignof to determine that alignment and CTASSERT what we expect
- remove unused macros
 1.4 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.3 28-Apr-2008  martin branches: 1.3.4; 1.3.102;
Remove clause 3 and 4 from TNF licenses
 1.2 23-Apr-2008  thorpej branches: 1.2.2;
Use <net/net_stats.h> / netstat_sysctl().
 1.1 12-Apr-2008  thorpej branches: 1.1.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.1.2.1 18-May-2008  yamt sync with head.
 1.2.2.1 16-May-2008  yamt sync with head.
 1.3.102.1 03-Apr-2021  thorpej Sync with HEAD.
 1.3.4.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.4.1 28-Apr-2008  mjf file icmp_private.h was added on branch mjf-devfs2 on 2008-06-02 13:24:23 +0000
 1.32 29-Aug-2022  knakahara Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.
 1.31 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.30 18-Feb-2015  christos branches: 1.30.16; 1.30.18;
PR/49676: Ryo Shimizu: ICMP_STATINC() buffer overflows
XXX: pullup-7
 1.29 24-Dec-2011  christos branches: 1.29.2; 1.29.6; 1.29.8; 1.29.16; 1.29.22; 1.29.24;
put the histograms last and make them autosize (breaks compat with netstat).
 1.28 07-Dec-2009  christos branches: 1.28.12; 1.28.16;
PR/42243: Yasuoka Masahiko: Add "net.inet.icmp.bmcastecho" sysctl support,
to disable icmp replies to the broadcast address.
 1.27 12-Apr-2008  thorpej branches: 1.27.4;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.26 06-Apr-2008  thorpej Change ICMP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.
 1.25 10-Dec-2005  elad branches: 1.25.70;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.24 05-Aug-2005  elad Add sysctls for IP, ICMP, TCP, and UDP statistics.
 1.23 03-Aug-2004  cube branches: 1.23.12;
Remove a common (icmpstat).
 1.22 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.21 30-Jun-2002  thorpej branches: 1.21.6;
Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.20 09-Jun-2002  itojun whitespace
 1.19 30-Oct-2001  kml branches: 1.19.8;
Add in support for timing out IPv4 routes added due to redirects,
as discussed in tech-net several weeks ago. It turned out that
KAME had already added this functionality to the IPv6 stack, so
I followed their example in adding the sysctl variables
net.inet.icmp.rediraccept and net.inet.icmp.redirtimeout.
 1.18 18-Oct-2000  itojun branches: 1.18.2; 1.18.4; 1.18.8;
count successful path MTU changes. good for debugging.
(there could be some discussion on when to increase the counter...)
 1.17 28-Jul-2000  itojun nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit
 1.16 10-Jul-2000  itojun implement net.inet.icmp.errppslimit.
make default value for net.inet.icmp.erratelimit to 0, as < 10ms value
does not do the right thing.
 1.15 10-Jun-2000  darrenr branches: 1.15.2;
add icmpreturndatabytes kernel variable (default 8) which specifies the
number of extra data bytes to return in ICMP error messages. This is
also available via sysctl as net.icmp.returndatabytes and is limited to
[8,512].
 1.14 15-Feb-2000  thorpej branches: 1.14.2;
Add ICMP error rate limiting, based on the same for ICMP6.

Note, we're reusing the previously unused slot for "MTU discovery" (which
was moved to the "net.inet.ip" branch of the sysctl tree quite some time
ago).
 1.13 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.12 10-Feb-1998  perry branches: 1.12.14; 1.12.20;
add/cleanup multiple inclusion protection.
 1.11 18-Oct-1997  kml remove extraneous icmp_do_mtudisc
 1.10 18-Oct-1997  kml change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc
 1.9 17-Oct-1997  kml Path MTU Discovery support. This is turned off by default.
Use sysctl -w net.inet.icmp.mtudisc=1 to turn on.
Still to come: path removal after some period, black hole detection
 1.8 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.7 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.6 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.5 10-Jan-1994  mycroft Change the counters to be all the same type -- u_long.
 1.4 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.12.20.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.12.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.15.2.1 16-Aug-2000  itojun pullup (approved by releng-1-5)

switch from net.inet*.*.*ratelimit to net.inet*.*.ppslimit.

(tags are rough estimate - we had some try-and-error in main trunc)
sys/netinet/icmp6.h 1.9 -> 1.11
sys/netinet/icmp_var.h 1.15 -> 1.17
sys/netinet/in_proto.c 1.39 -> 1.42
sys/netinet/ip_icmp.c 1.50 -> 1.51, 1.52 -> 1.54
sys/netinet/tcp_input.c 1.111 -> 1.112, 1.115 -> 1.117
sys/netinet/tcp_usrreq.c 1.52 -> 1.53
sys/netinet/tcp_var.h 1.72 -> 1.75
sys/netinet6/icmp6.c 1.34 -> 1.35, 1.36 -> 1.38
sys/netinet6/in6_proto.c 1.17 -> 1.19
 1.18.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.18.4.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.18.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.18.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.18.2.3 01-Aug-2002  nathanw Catch up to -current.
 1.18.2.2 20-Jun-2002  nathanw Catch up to -current.
 1.18.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.19.8.2 15-Jul-2002  gehenna catch up with -current.
 1.19.8.1 20-Jun-2002  gehenna catch up with -current.
 1.21.6.6 11-Dec-2005  christos Sync with head.
 1.21.6.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.21.6.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.21.6.3 18-Sep-2004  skrll Sync with HEAD.
 1.21.6.2 12-Aug-2004  skrll Sync with HEAD.
 1.21.6.1 03-Aug-2004  skrll Sync with HEAD
 1.23.12.1 21-Jun-2006  yamt sync with head.
 1.25.70.1 02-Jun-2008  mjf Sync with HEAD.
 1.27.4.1 11-Mar-2010  yamt sync with head
 1.28.16.1 18-Feb-2012  mrg merge to -current.
 1.28.12.1 17-Apr-2012  yamt sync with head
 1.29.24.1 06-Apr-2015  skrll Sync with HEAD
 1.29.22.1 21-Feb-2015  martin Pull up following revision(s) (requested by christos in ticket #537):
sys/netinet/icmp_var.h: revision 1.30
sys/netinet/ip_icmp.h: revision 1.34
PR/49676: Ryo Shimizu: ICMP_STATINC() buffer overflows
XXX: pullup-7
 1.29.16.1 21-Feb-2015  martin Pull up following revision(s) (requested by christos in ticket #1258):
sys/netinet/icmp_var.h: revision 1.30
sys/netinet/ip_icmp.h: revision 1.34
PR/49676: Ryo Shimizu: ICMP_STATINC() buffer overflows
 1.29.8.1 21-Feb-2015  martin Pull up following revision(s) (requested by christos in ticket #1258):
sys/netinet/icmp_var.h: revision 1.30
sys/netinet/ip_icmp.h: revision 1.34
PR/49676: Ryo Shimizu: ICMP_STATINC() buffer overflows
 1.29.6.1 03-Dec-2017  jdolecek update from HEAD
 1.29.2.1 21-Feb-2015  martin Pull up following revision(s) (requested by christos in ticket #1258):
sys/netinet/icmp_var.h: revision 1.30
sys/netinet/ip_icmp.h: revision 1.34
PR/49676: Ryo Shimizu: ICMP_STATINC() buffer overflows
 1.30.18.1 10-Jun-2019  christos Sync with HEAD
 1.30.16.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.7 16-Mar-1997  is move if_arc.h to sys/net
 1.6 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.5 07-Jun-1995  cgd branches: 1.5.8;
update from Ignatios Souvatzis
 1.4 14-Apr-1995  chopps update arc_input() proto to match reality.
 1.3 29-Mar-1995  briggs KERNEL -> _KERNEL
 1.2 02-Mar-1995  chopps add prototypes
 1.1 23-Feb-1995  glass preliminary arcnet support. uses lame but RFC address resolution
 1.5.8.2 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.5.8.1 08-Feb-1997  is Extinguish the link level address from struct arccom, too.
XXX Todo: change this in the hardware driver.
 1.317 13-Nov-2024  roy ARP/ND6: Revert prior

Turns out some people actually use this behaviour and strictly speaking
it is allowed by RFC5227 2.4 where it says:

At any time, if a host receives
an ARP packet (Request *or* Reply) where the 'sender IP address' is
(one of) the host's own IP address(es) configured on that interface,
but the 'sender hardware address' does not match any of the host's
own interface addresses, then this is a conflicting ARP packet

The key part is "any of the host's own interface addreses".
 1.316 04-Oct-2024  roy ARP: only ignore messages from the receving interface

This means that you can no longer share the same IP addresses
on different interfaces connected to the same subnet.
This was a very non standard solution and you can now
get the same functionality by using lagg(4) instead.

While here, flip log movements sysctl back to 1.

This reverts r1.253 and r1.286.
 1.315 09-Sep-2024  ozaki-r arp: allow to send packets without an ARP resolution just after receiving an ARP request

On receiving an ARP request, the current implemention creates an ARP
cache entry but with ND_LLINFO_NOSTATE. Such an entry still needs
an ARP resolution to send back a packet to the requester. The original
behavior before introducing the common ND framework didn't need the
resolution. IPv6 doesn't as well. To restore the original behavior,
make a new ARP cache entry with ND_LLINFO_STALE like IPv6 does.
 1.314 20-Aug-2024  ozaki-r arp: fix the behavior on detecting an address duplication without IPv4 DAD

On receiving an ARP request that has the same source protocol address as
the own address, i.e., address duplication, the original behavior of
a kernel prior to supporing IPv4 DAD is to send an ARP reply. It is
the same with a latest kernel with DAD enabled. However, a latest
kernel without DAD sends back an GARP packet. Restore the original
behavior.
 1.313 29-Jun-2024  riastradh branches: 1.313.2;
netinet: Use _NET_STAT* API instead of direct array access.

PR kern/58380
 1.312 24-Feb-2024  mlelstv Attribute debug message.
Fixes PR 57959
 1.311 15-Nov-2022  roy branches: 1.311.2;
arp: Validate ARP source hardware address matches Ethernet source

RFC 5227 section 1.1 states that for a DaD ARP probe the sender hardware
address must match the hardware address of the interface sending the
packet.

We can now verify this by checking the mbuf tag PACKET_TAG_ETHERNET_SRC.

This fixes an obsure issue where an old router was sending out bogus
ARP probes.

Thanks to Ryo Shimizu <ryo@nerv.org> for the re-implementation.
 1.310 15-Nov-2022  roy Revert prior.
 1.309 14-Nov-2022  roy arp: Validate L2 sender hardware address matches ARP probe

RFC 5227 section 1.1 states that for a DaD ARP probe the sender hardware
address must match the hardware address of the interface sending the
packet.

We can now verify this by checking the mbuf packet header.

This fixes an obsure issue where an old router was sending out bogus
ARP probes.
 1.308 03-Sep-2022  thorpej Convert ARP from a legacy netisr to pktqueue.
 1.307 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.306 16-Feb-2021  martin One more time: backout arp header alignment, now that the alignment
asserted has been aligned to reality.
Also remove unused ARP_HDR_ALIGNED_P macro. Pointed out by roy.
 1.305 16-Feb-2021  martin Undo previous backout: alignment is needed here.
The reason for the previous backout was a misunderstanding (POINTER_ALIGNED_P
was broken, but the assertion fired even after it got fixed).
 1.304 15-Feb-2021  christos Undo previous; POINTER_ALIGNED_P was broken.
 1.303 15-Feb-2021  christos put back alignment (reported by martin@)
 1.302 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.301 14-Feb-2021  roy if_arp: Just KASSERT that arphrd is aligned

While here improve readability of checking ARP IEEE1394 matches interface.
 1.300 13-Feb-2021  roy Prior alignment fixes should not use an offset
 1.299 13-Feb-2021  roy if_arp: Ensure that arphdr is aligned
 1.298 02-Feb-2021  yamt arp: Plug an mbuf leak
 1.297 15-Sep-2020  roy branches: 1.297.2;
Implement RFC 7048, making Neighbor Unreachability Detection less impatient

RFC 7048 Section 3 says in the UNREACHABLE state packets continue to be
sent to the link-layer address and then backoff exponentially.
We adjust this slightly and move to the INCOMPLETE state after
`nd_mmaxtries` probes and then start backing off.

This results in simpler code whilst providing a more robust model which
doubles the time to failure over what we did before.
We don't want to be back to the old ARP model where no unreachability
errors are returned because very few applications would look at
unreachability hints provided such as ND_LLINFO_UNREACHABLE or RTM_MISS.
 1.296 14-Sep-2020  roy nd: Name l3addr union of llentry and use in-place of nd_addr.

Probably makes more sense and makes nd.h less messy.
 1.295 11-Sep-2020  roy ARP: Use ND rather than our own.

This brings the benefit of Neighbour Unreachability Detection which is
something ARP sorely lacks.

The new timings mirror those of IPv6 and are adjustable via sysctl(8).
Unlike IPv6 ND, these are global and not per interface.
 1.294 09-Mar-2020  roy route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.
 1.293 09-Mar-2020  roy arp: report RTM_MISS when removing an unresolved entry in the arp table

Otherwise we only get it when renewing and we've sent too many requests.
This mirrors INET6 behaviour.
 1.292 23-Jan-2020  roy arp: find source address then target address when processing input

This fixes the case where another host having a duplicate ip address
starts using it right away without probing for it's availability.

While here, prefer ifatoia over a strict cast.
 1.291 20-Jan-2020  thorpej Remove FDDI support.
 1.290 19-Jan-2020  thorpej Remove Token Ring support.
 1.289 11-Oct-2019  roy branches: 1.289.2;
ARP: Don't defend ARP probes.

We should let the nature of ARP takes it's course here when our address
is neither tentative nor duplicated.
This allows the host to work with ARP ping, which was broken in r1.279.
 1.288 25-Sep-2019  ozaki-r Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.
 1.287 01-Sep-2019  roy inet: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.
This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This tells us when a new lladdr has been added (RTM_ADD),
changed (RTM_CHANGE), deleted (RTM_DELETED) or has failed to been
resolved (RTM_MISS). The latter case can be interpreted as unreachable.
 1.286 30-Aug-2019  roy ARP: change default sysctl entry log_movements to 0

IP address sharing is a thing and shouldn't cause needless diagnostics
by default.
 1.285 30-Aug-2019  roy ARP: remove unused sysctl entry log_unknown_network
 1.284 22-Aug-2019  roy rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9
 1.283 19-Aug-2019  ozaki-r Initialize dom_mowner for MBUFTRACE
 1.282 29-Apr-2019  roy branches: 1.282.2;
Introduce rt_addrmsg_src which adds RTA_AUTHOR to the message.
Use this when we notify userland of a duplicate address
and set RTA_AUTHOR to the hardware address of the sender.

While here, match the logging diagnostic of INET6 to the simpler one
of INET so it's consistent.
 1.281 29-Apr-2019  roy Move lla_snprintf from if_arp.c to dl_print.c
 1.280 29-Apr-2019  roy rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.
 1.279 24-Apr-2019  roy ARP: Enable Address Defence again.

Revert the tentative/duplicated check and test for if it's been broadcast
or not. This reverts r1.245.
 1.278 22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.277 29-Nov-2018  ozaki-r Introduce and use ip_dad_enabled() and ip6_dad_enabled() functions
 1.276 30-Oct-2018  ozaki-r Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.
 1.275 11-May-2018  maxv branches: 1.275.2;
static
 1.274 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.273 11-Apr-2018  maxv Add XXX.
 1.272 10-Apr-2018  maxv Remove unused mbuf argument from arpcreate() and arplookup().
 1.271 08-Mar-2018  ozaki-r Fix a race condition on DAD destructions (again)

The previous fix to DAD timers was wrong; it avoided a use-after-free but
instead introduced a memory leak. The destruction method had delegated
a destruction of a DAD timer to the timer itself and told that by setting NULL
to dp->dad_ifa. However, the previous fix made DAD timers do nothing on
the sign.

Fixing the issue with using callout_stop isn't easy. One approach is to have
a refcount on dp but it introduces extra complexity that we want to avoid.

The new fix falls back to using callout_halt, which was abandoned because of
softnet_lock. Fortunately now the network stack is protected by KERNEL_LOCK
so we can remove softnet_lock from DAD timers (callout) and use callout_halt
safely.
 1.270 06-Mar-2018  ozaki-r Fix reference leaks of llentry

callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).

While here, we can remove remaining abuses of mutex_owned for softnet_lock.
 1.269 06-Mar-2018  ozaki-r Tweak sanity checks

Scheduling a timer of static entries is wrong.
 1.268 01-Mar-2018  roy branches: 1.268.2;
Sprinkle some const.
 1.267 24-Feb-2018  ozaki-r Avoid a race condition of DAD timer destructions

When we see dp->dad_ifa == NULL, it means that the ifa is being deleted and also
the callout is scheduled again by someone. We shouldn't rely on a result of
callout_pending to know if the callout is scheduled because it returns false if
the subsequent callout handler is already on the fly.

We have to always delegate the destruction of dp to the subsequent handler
unconditionally if dp->dad_ifa == NULL. Otherwise, the first handler destroys
the dp and the second handler tries to handle destroyed dp.
 1.266 14-Feb-2018  maxv Remove IFF_STATICARP, we don't support this, and the code is useless in its
current form.

ok ozaki-r@
 1.265 13-Feb-2018  maxv Mmh. Add a missing check: if ARP was disabled on the interface, don't
process ARP packets. Otherwise the kernel will add ARP entries even if

ifconfig wm0 -arp

was entered.
 1.264 13-Feb-2018  maxv Be tougher:

* In arpintr(), don't allow IEEE1394 packets on non-IEEE1394 interfaces.

* In revarpinput(), kick IEEE1394 packets right away. They are not
supported.
 1.263 13-Feb-2018  maxv Same change as rev1.258, but this time in revarpinput: use m_pullup.
 1.262 13-Feb-2018  maxv Minor stylistic changes, and use C99 types.
 1.261 13-Feb-2018  maxv Replace dead code by KASSERT.
 1.260 13-Feb-2018  maxv Don't force ARPHRD_IEEE1394 on IEEE1394 interfaces. If it's not there, then
kick the packet. And do this earlier.
 1.259 13-Feb-2018  maxv Use only one label, clearer.
 1.258 13-Feb-2018  maxv Fix three things in arpintr():

* mtod can't return NULL.

* It is wrong to kick the packet if m->m_len < arplen. While this check
always returns false for native Ethernet interfaces, it may not if the
frame is encapsulated in EtherIP/L2TP. Use m_pullup instead.

* Remove XXX, it is fine. Reduce the indentation level afterwards.
 1.257 13-Feb-2018  maxv Style, no functional change.
 1.256 16-Jan-2018  ozaki-r Make DAD destructions (MP-)safe with callout_stop

arp_dad_stoptimer and nd6_dad_stoptimer can be called with or without
softnet_lock held and unfortunately we have no easy way to statically know which.
So it is hard to use callout_halt there.

To address the situation, we use callout_stop to make the code safe. The new
approach copes with the issue by delegating the destruction of a callout to
callout itself, which allows us to not wait the callout to finish. This can be
done thanks to that DAD objects are separated from other data such as ifa.

The approach is suggested by riastradh@
Proposed on tech-kern@ and tech-net@
 1.255 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.254 10-Nov-2017  ozaki-r Fix a deadlock between a route update and lltable

It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.

If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.

A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update

Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.

Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.253 27-Jun-2017  roy Use if_get_bylla() instead of just looking at the lla of the interface
the address belongs to.
This allows any ARP message we receieved from another interface to
be correctly dropped.

While here, move the protocol length check higher up the food chain.
 1.252 21-Jun-2017  ozaki-r Don't create a permanent L2 cache entry on adding an address to an interface

It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
 1.251 16-Jun-2017  ozaki-r Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@
 1.250 18-May-2017  ozaki-r branches: 1.250.2;
Lookup caches from a CARP interface if the packet is routed to the interface

This fixes CARP setups without carpdev (physical interface) having any IPs.
 1.249 12-May-2017  ryo replace in_fmtaddr() by IN_PRINT(), and delete function in_fmtaddr()
 1.248 04-Apr-2017  ozaki-r branches: 1.248.4;
Get rid of unused macros
 1.247 17-Mar-2017  roy If we're not doing DAD, don't set IN_IFF_TENTATIVE.
 1.246 10-Mar-2017  roy If an ARP packet is received to the null host (0.0.0.0) then look for
an address matching the sender IP address on the interface.
This allows DAD to fail during the probe phase when a reverse ARP
proxy is present.
 1.245 09-Mar-2017  roy Only check target address collision if the sender address is the null
address (ie a DAD probe) or our matching address is either TENTATIVE
or DUPLICATED.
 1.244 24-Feb-2017  roy Only do DaD if the interface actually has the address.
 1.243 21-Feb-2017  ozaki-r Replace malloc for DAD with kmem and move them out of the lock for DAD
 1.242 11-Feb-2017  roy Allow Unicast Poll from RFC 1122 to bypass DaD checking.
 1.241 07-Feb-2017  ozaki-r Add missing NULL checks for m_get_rcvif
 1.240 24-Jan-2017  ozaki-r Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.239 21-Jan-2017  maxv Add some checks, mostly same as in_arpinput.
 1.238 20-Jan-2017  maxv Make sure the protocol address length equals that of IPv4. Also, make sure
the hardware address length equals that of the interface we received the
packet on. Otherwise a packet could easily set them both to zero and make
the kernel read beyond the allocated mbuf, which is terrible.

Note: for the latter we drop the packet instead of replying, since it is
malformed.

Note: I also added an ugly hack in CARP, since it apparently expects at
least six bytes.
 1.237 20-Jan-2017  maxv Style
 1.236 20-Jan-2017  maxv Reput a nullcheck that was mistakenly removed in rev1.204. ar_hrd is
packet-controlled.
 1.235 16-Jan-2017  christos rename arplog -> ARPLOG to make it clear that it is a macro and tuck-in the
buffer used for address formatting.
 1.234 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.233 12-Dec-2016  ozaki-r branches: 1.233.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.232 05-Nov-2016  roy Don't handle ARP duplication for the unspecified address.
 1.231 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.230 11-Oct-2016  roy Implement RFC 5227 2.4 Ongoing Conflict Detection and Address Defence.

If ip_dad_count is 0, then the conflict is just logged and the address
is not marked as duplicated.
 1.229 11-Oct-2016  roy Mark arprequest static and introduce arpannounce so that gratuitous
ARP requests are only send from valid addresses.
 1.228 03-Oct-2016  ozaki-r Fix race condition on ifqueue used by traditional netisr

If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.

There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.

To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.

The same race condition exists in route_intr. Fix it as well.

Reviewed by knakahara@
 1.227 18-Sep-2016  christos Dealing with arplog is a bit more complicated...
 1.226 16-Sep-2016  roy Clear IN_IFF_TENTATIVE when stopping DaD here.
 1.225 16-Sep-2016  roy Don't setup DaD for INADDR_ANY
 1.224 15-Sep-2016  roy Allow arplog to be used outside of if_arp.c
 1.223 07-Sep-2016  roy Refine arplog to be like nd6log.
 1.222 08-Aug-2016  ozaki-r Restore ARP_STAT_DFRTOTAL deleted unexpectedly
 1.221 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.220 28-Jul-2016  ozaki-r Fix panic on adding/deleting IP addresses under network load

Adding and deleting IP addresses aren't serialized with other network
opeartions, e.g., forwarding packets. So if we add or delete an IP
address under network load, a kernel panic may happen on manipulating
network-related shared objects such as rtentry and rtcache.

To avoid such panicks, we still need to hold softnet_lock in in_control
and in6_control that are called via ioctl and do network-related operations
including IP address additions/deletions.

Fix PR kern/51356
 1.219 25-Jul-2016  ozaki-r Make DAD of ARP/NDP MP-safe with coarse-grained locks

The change also prevents arp_dad_timer/nd6_dad_timer from running if
arp_dad_stop/nd6_dad_stop is called, which makes sure that callout_reset
won't be called during callout_halt.
 1.218 25-Jul-2016  ozaki-r Use KASSERT for checking non-NULL of ifa->ifa_ifp

ifa->ifa_ifp should be always non-NULL, so doing the check only if
DIAGNOSTIC is ok.
 1.217 08-Jul-2016  ozaki-r branches: 1.217.2;
Replace macros to get an IP address with proper inline functions

The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
 1.216 06-Jul-2016  ozaki-r Tweak indentation
 1.215 01-Jul-2016  ozaki-r Make sure to free all interface addresses in if_detach

Addresses of an interface (struct ifaddr) have a (reverse) pointer of an
interface object (ifa->ifa_ifp). If the addresses are surely freed when
their interface is destroyed, the pointer is always valid and we don't
need a tweak of replacing the pointer to if_index like mbuf.

In order to make sure the assumption, the following changes are required:
- Deactivate the interface at the firstish of if_detach. This prevents
in6_unlink_ifa from saving multicast addresses (wrongly)
- Invalidate rtcache(s) and clear a rtentry referencing an address on
RTM_DELETE. rtcache(s) may delay freeing an address
- Replace callout_stop with callout_halt of DAD timers to ensure stopping
such timers in if_detach
 1.214 30-Jun-2016  ozaki-r Make sure that ifaddr is published after its initialization finished

Basically we should insert an item to a collection (say a list) after
item's initialization has been completed to avoid accessing an item
that is initialized halfway. ifaddr (in{,6}_ifaddr) isn't processed
like so and needs to be fixed.

In order to do so, we need to tweak {arp,nd6}_rtrequest that depend
on that an ifaddr is inserted during its initialization; they explore
interface's address list to determine that rt_getkey(rt) of a given
rtentry is in the list to know whether the route's interface should
be a loopback, which doesn't work after the change. To make it work,
first check RTF_LOCAL flag that is set in rt_ifa_addlocal that calls
{arp,nd6}_rtrequest eventually. Note that we still need the original
code for the case to remove and re-add a local interface route.
 1.213 28-Jun-2016  ozaki-r Add missing NULL checks for m_get_rcvif_psref
 1.212 20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.211 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.210 17-May-2016  ozaki-r Get rid of unnecessary assignment
 1.209 25-Apr-2016  ozaki-r Check error of rt_setgate and rt_settag
 1.208 19-Apr-2016  ozaki-r Constify rtentry of arpresolve

We don't need to (rather shouldn't) modify rtentry in there.
 1.207 18-Apr-2016  ozaki-r Fix panic on receiving an ARP request

The panic happened if an ARP request has a spa (i.e., IP address) whose
ARP entry already exists in the table as a static ARP entry.
 1.206 13-Apr-2016  ozaki-r ddb: rename show arptab to show routes

show arptab command of ddb is now inappropriate because it actually dumps
routes but arp entries aren't routes anymore. So rename it to show routes
and move the code from if_arp.c to route.c.

ok christos@
 1.205 07-Apr-2016  christos - tidy up error messages
- add a length argument to arpresolve()
- add KASSERT for overflow
 1.204 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.203 25-Jan-2016  ozaki-r Remove unnecessary LLE_REMREF

The code around it was copied from arptimer, but LLE_REMREF
is unnecessary because it is needed only for arptimer that
is called after LLE_ADDREF.

This is a possible fix for PR#50548, PR#50702 and PR#50704.
 1.202 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.201 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.200 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.199 05-Jan-2016  ozaki-r Make revarprequest static
 1.198 17-Dec-2015  ozaki-r Fix memory leak of llentry#la_opaque

llentry#la_opaque which is for token ring is allocated in arp.c
and freed in arp.c when freeing llentry. However, llentry can be
freed from other places, e.g., lltable_free. In such cases,
la_opaque is never freed.

To fix that, add a new callback (lle_ll_free) to llentry and
register a destruction function of la_opque to it. On freeing a
llentry, we can surely free la_opque via the callback.
 1.197 16-Dec-2015  ozaki-r Fix token_rif extractions from llentry
 1.196 11-Dec-2015  ozaki-r Introduce arp_settimer

No functional change.
 1.195 30-Nov-2015  ozaki-r Get rid of a big block in in_arpinput

No functional change.
 1.194 19-Nov-2015  ozaki-r Restore softnet_lock and KERNEL_LOCK for rtrequest and rtfree

We still need them for rt operations.
 1.193 06-Nov-2015  ozaki-r Fix db_print_llinfo

rt_llinfo is now struct llentry.
 1.192 06-Nov-2015  ozaki-r Fix inappropriate rt_flags check

It depended on either RTF_CLONED or RTF_CLONING must be set, however,
the assumption didn't meet for userland problems that create a route
via RTM_ADD.

This fixes an issue that running rarpd causes the following kernel panic
reported by nonaka@:
panic: kernel diagnostic assertion "(la->la_flags & LLE_STATIC) == 0"
failed: file "/usr/src/sys/netinet/if_arp.c", line 1339
 1.191 20-Oct-2015  ozaki-r Stop callout in arp_rtrequest(RTM_DELETE)

This change fixes arptimer panic after removing an interface
(say by drvctl -d), which is reported by Takahiro Hayashi.

This change also fixes llentry's reference counting; we have
to take into account rtentry#rt_llinfo as well as arptimer.
 1.190 20-Oct-2015  ozaki-r Stop using softnet_lock (fix possible deadlock)

Using softnet_lock for mutual exclusion between lltable_free and
arptimer was wrong and had an issue causing a deadlock between
them; lltable_free waits arptimer completion by calling
callout_halt with softnet_lock that is held in arptimer, however
lltable_free also holds llentry's lock that is also held in
arptimer so arptimer never obtain the lock and both never go
forward eventually. We have to pass llentry's lock to
callout_halt instead.
 1.189 14-Oct-2015  roy In the event of an error within arpresolve(), delete the cloned route
otherwise it would never be deleted.
 1.188 14-Oct-2015  roy Save and clear the la route while we have a write lock
 1.187 13-Oct-2015  roy arpresolve() now returns 0 on success otherwise an error code.
Callers of arpresolve() now pass the error code back to their caller,
masking out EWOULDBLOCK.

This allows applications such as ping(8) to display a suitable error
condition.
 1.186 13-Oct-2015  roy Move the NOARP check up a bit so that it works when an la is created
but hasn't been resolved yet.
Fixes PR kern/17611.
 1.185 13-Oct-2015  roy Simplify la handling in arpresolve() by asking arplookup() not to create
a la. If a la is needed arpresolve() will then create it or mark the
current la as writable.
 1.184 08-Oct-2015  roy Create a temporary define involving IFF_STATICARP if we have it
instead of just testing for __FreeBSD__.
No functional change.

ok: ozaki-r@
 1.183 07-Oct-2015  ozaki-r Create an llentry after fixing an interface to store

In case of RTF_LOCAL routes, we change an output interface
of a route from original one to lo0ifp. An llentry also
has to be stored to lo0ifp in such cases.

Problem reported by roy@
 1.182 05-Oct-2015  ozaki-r Fix arplookup logic

It should first lookup and then create an entry if not found (and if
creation is requested).
 1.181 11-Sep-2015  roy If, for whatever reason, a local interface route is removed and then
re-added, mark it as a local route.

While here, if changing the route to go via the loopback interface
remove any inherited MTU value.
 1.180 09-Sep-2015  ozaki-r Remove wrong KASSERT in arptfree

la_rt can be NULL because arptimer that calls arptfree doesn't always
free llentry so llentry can remain with la_rt == NULL. So we instead
check whether la_rt is NULL or not and do arptfree if not.

This fixes PR kern/50184 (confirmed by martin@) and
PR kern/50186 (maybe).
 1.179 09-Sep-2015  ozaki-r Revert v1.176 for further proper fix
 1.178 07-Sep-2015  ozaki-r CID 1322880: remove unnecessary m != NULL checks
 1.177 07-Sep-2015  ozaki-r CID 1322878: simplify log output flow
 1.176 02-Sep-2015  christos XXX: Disable KASSERT for now since locking is broken for interface removals.
 1.175 31-Aug-2015  ozaki-r Remove obsolete global variables and sysctl MIBs
 1.174 31-Aug-2015  ozaki-r Replace ARP cache (llinfo) with lltable/llentry

Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
- ARP specific data are stored in the hashed list
of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
- the global timer callout with the big locks can be
removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
- it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
- it was a parameter that prevents expiration of active caches
- Removed to simplify the timer logic, but we may be able to
restore the feature if really needed

Proposed on tech-kern and tech-net.
 1.173 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.172 12-Aug-2015  ozaki-r Move insane goto label
 1.171 07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.170 15-Jul-2015  ozaki-r Make global variables static
 1.169 22-May-2015  ozaki-r Replace NARC with NARCNET to follow renaming at 2007

Hmm, is anyone using this?
 1.168 21-May-2015  ozaki-r Use LIST_FOREACH{,_SAFE}

The first loop doesn't remove any items in it, so we can use
LIST_FOREACH instead of LIST_FOREACH_SAFE.
 1.167 21-May-2015  ozaki-r Use NULL instead of 0 for pointers
 1.166 21-May-2015  ozaki-r Make arp_init, in_revarpinput and revarprequest static
 1.165 16-May-2015  roy Separate ARP handling DAD from inet.
This is done by signalling the intent to try tentative addresses
and then clearing the intent once the address is setup.
When the ARP handler is installed (arp_ifinit) then it adds
dad start and stop functions to the address which are used instead
of calling ARP directly.
 1.164 03-May-2015  justin Rename delay variable as it shadows a global on arm.
 1.163 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.162 23-Mar-2015  roy Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.
 1.161 26-Feb-2015  roy Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.
 1.160 13-Nov-2014  christos branches: 1.160.2;
Add sysctl to selectively log arp packets from unknown network. (Adrien URBAN).
 1.159 05-Sep-2014  matt Deanonymize structure for llinfo_arp.
 1.158 03-Jun-2014  ozaki-r branches: 1.158.2; 1.158.4;
Call ifp->if_output in revarprequest with KERNEL_LOCK held

Otherwise, it hits KASSERT(KERNEL_LOCKED_P()) in ether_output
when nfs_boot fails and tries RARP.
 1.157 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.156 12-Apr-2014  gdt revarprequest: Avoid leaking mbuf.

In revarprequest, an mbuf could perhaps be leaked in an error path.
My reading of the code is that this is not possible, because ar_pro is
set to ETHERNET_IP, and ar_tha can only be null in the 1394 case.
But, better to have the free call anyway; ar_tha does not have a
documented interface contract :-)

Pointed out by Maxime Villard.
 1.155 25-Feb-2014  pooka branches: 1.155.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.154 02-Jan-2012  liamjfoy branches: 1.154.2; 1.154.6; 1.154.8; 1.154.10; 1.154.16;
Remove dead variable
 1.153 31-Dec-2011  christos - fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.152 27-Aug-2011  christos branches: 1.152.2; 1.152.6;
Add 3 logging sysctls for arp from freebsd:

1. log_movements: do you want to log the arp overwritten message or not?
2. log_wrong_iface: do you want to log when an arp arrives at the wrong
interface?
3. log_permanent_modify: do you want to log when an arp message attempts
to overwrite a static entry?

I did not call the sysctls log_arp like FreeBSD does, because we already
have an arp sysctl level. The default is on for all three of them.
 1.151 03-May-2011  dyoung arp_drain() may be called with locks held, so instead of doing any work
in arp_drain(), set a drain-needed flag. Do the work in the fasttimo
handler.

Contributed by Coyote Point Systems, Inc.
 1.150 01-Feb-2011  matt Add a new AF/PF_ROUTE which is 64-bit clean which makes the routing socket
interface (and its associated sysctls) act identically for both 32 and 64 bit
programs. The old unclean one remains for backward compatibility.
 1.149 20-Nov-2009  christos branches: 1.149.4; 1.149.6; 1.149.8;
ar_tha() can return NULL; treat this as an error.
 1.148 03-Nov-2009  christos Handle RFC 5227 ARP probes properly, don't drop 0.0.0.0 source packets
silently. (Patrik Lahti <plahti at qnx dot com>)
 1.147 16-Sep-2009  pooka Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.146 12-Aug-2009  dyoung Don't require the gateway address to have room for both an interface
name and address. Room for an address will do. This should fix
a regression in 'arp -s ...' on interfaces such as xennet0 with
unusually long names.

I will request a pull-up to netbsd-5.
 1.145 11-Jan-2009  christos merge christos-time_t
 1.144 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.143 24-Oct-2008  dyoung branches: 1.143.2; 1.143.4; 1.143.6; 1.143.8;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.142 21-Oct-2008  ad arp_drain: no reason to complain if arp_lock is already held.
 1.141 28-Aug-2008  uebayasi Missing "\n" in log(9) messages.
 1.140 13-May-2008  dyoung branches: 1.140.4;
bzero -> memset, bcopy -> memcpy.
 1.139 13-May-2008  dyoung Cosmetic: use __arraycount(). s/0/NULL/ where appropriate. Pass
"null" instead of 0 to printf %s. Remove superfluous parentheses
in return statements. Compare pointers with NULL instead of "testing
truth."
 1.138 11-May-2008  dyoung Use memset() instead of Bzero().

In arplookup1(), put the static sockaddr_inarp onto the stack, and
zero it before use.
 1.137 04-May-2008  thorpej branches: 1.137.2;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.136 02-May-2008  ad PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.
 1.135 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.134 24-Apr-2008  ad branches: 1.134.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.133 23-Apr-2008  thorpej Use <net/net_stats.h> / netstat_sysctl().
 1.132 15-Apr-2008  thorpej branches: 1.132.2;
Make ARP stats per-cpu.
 1.131 20-Jan-2008  joerg branches: 1.131.6; 1.131.8;
Now that __HAVE_TIMECOUNTER and __HAVE_GENERIC_TODR are invariants,
remove the conditionals and the code associated with the undef case.
 1.130 07-Dec-2007  elad branches: 1.130.4;
Use struct initializers. No functional change.
 1.129 14-Nov-2007  cube branches: 1.129.2;
Follow up on arc -> arcnet renaming. Pointed out by joerg@.
 1.128 02-Sep-2007  dyoung branches: 1.128.4; 1.128.6;
We cannot sleep in a software interrupt, so do not sockaddr_dl_alloc(...,
M_WAITOK). Instead, sockaddr_dl_init() a sockaddr_dl on the stack.
 1.127 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.126 27-Aug-2007  dyoung branches: 1.126.2;
Reorganize and extract arplookup1() for code-sharing. Share
null_sdl. Introduce arp_setgate() for initializing a link-layer
nexthop, and use it to fulfill RTM_SETGATE requests.
 1.125 19-Jul-2007  dyoung branches: 1.125.4; 1.125.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.124 09-Jul-2007  ad branches: 1.124.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.123 12-Jun-2007  dyoung Complete removal of radix_node knowledge.
 1.122 09-Jun-2007  dyoung Get rid of radix_node_head.rnh_walktree, because it is only ever
set to rn_walktree.

Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.
 1.121 04-Mar-2007  christos branches: 1.121.2; 1.121.4; 1.121.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.120 22-Feb-2007  matt Fix lossage from boolean_t -> bool and updated x86 bus_dma.
 1.119 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.118 17-Feb-2007  dyoung branches: 1.118.2;
bcopy -> memcpy

Use NULL instead of (struct rtentry *)0.
 1.117 24-Nov-2006  christos branches: 1.117.2; 1.117.4; 1.117.8;
fix spelling of accidentally; from Zapher
 1.116 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.115 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.114 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.113 30-Aug-2006  christos branches: 1.113.2; 1.113.4;
Fix initializers
 1.112 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.111 25-May-2006  bouyer Make sure the mbuf is writable before trying to write to it.
 1.110 18-May-2006  liamjfoy branches: 1.110.2;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.109 12-May-2006  mrg since ar_tha() can return NULL, don't pass it directly to functions
that expect real addresses. explicitly KASSERT() that it is not
NULL in the kernel and just avoid using it userland.

(the kernel could be more defensive about this, but, until now it
would have just crashed anyway.)
 1.108 24-Dec-2005  perry branches: 1.108.4; 1.108.6; 1.108.8; 1.108.12;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.107 11-Dec-2005  christos merge ktrace-lwp.
 1.106 20-Jun-2005  atatat branches: 1.106.2;
Change the rest of the sysctl subsystem to use const consistently.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
 1.105 01-Jun-2005  drochner need a "const"
 1.104 29-May-2005  christos - remove local copy of hexdigits.
- sprinkle const
- use mem*() instead of b*()
 1.103 26-Feb-2005  perry branches: 1.103.2; 1.103.4; 1.103.6;
nuke trailing whitespace
 1.102 02-Feb-2005  perry de-__P, do some ANSIfication.
 1.101 23-Jan-2005  matt branches: 1.101.2;
Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
 1.100 04-Dec-2004  peter branches: 1.100.4;
Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.99 29-Sep-2004  christos PR/21902: Sean Boudreau: arplookup() incrementing arpstat.as_allocfail
erroneously.
 1.98 25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.97 28-Apr-2004  ragge Send an arp request before the arp entry times out if the entry is active,
to avoid deleting active entries.
Add sysctl support to tune the default arp timeout values.
 1.96 22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.95 21-Apr-2004  matt Constify if.c radix.c and route.c (and fix related fallout).
 1.94 24-Sep-2003  itojun on arplookup() failure, nuke cloned route - otherwise outsider could use massive
number of bogus ARPs for DoS attack. FreeBSD-SA-03:14.arp
 1.93 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.92 26-Feb-2003  matt branches: 1.92.2;
Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.91 20-Nov-2002  dyoung Squash a panic: do not try to print the name of a NULL interface.
 1.90 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.89 25-Jun-2002  enami If we need to fix up ar_hrd field, we must do it before using ar_tpa/tha.
 1.88 25-Jun-2002  itojun in arprequest(), fill ar_hrd only for IEEE1394. for other cases,
ifp->if_output will fill it for us.
 1.87 25-Jun-2002  enami No need to include same file twice.
 1.86 25-Jun-2002  enami Use if_addrlen macro rather than if_data.ifi_addrlen.
 1.85 24-Jun-2002  enami The ieee1394 arp reply should be broadcast.
 1.84 24-Jun-2002  enami Don't use a pointer before it is initialized.
 1.83 24-Jun-2002  itojun set ar_hrd for RFC-defined cases
 1.82 24-Jun-2002  itojun integrate IEEE1394 ARP into generic ARP logic.
XXX there's no check at all in ar_hrd, and we don't set ar_hrd on outgoing.
it seems like a bad thing.
 1.81 09-Jun-2002  itojun whitespace
 1.80 09-Jun-2002  itojun enforce IPv4 link MTU for FDDI and ARCNET even in RTF_GATEWAY case.
PR 17151.
 1.79 13-Nov-2001  lukem branches: 1.79.8; 1.79.10;
add RCSIDs
 1.78 20-Aug-2001  itojun if I'm bridging and got a packet to interface address on if A from if B,
advertise MAC address for if A with ARP reply.
 1.77 17-Aug-2001  thorpej Permit weaker interface matches for incoming ARP packets if the packet was
received on an interface that is part of a bridge and we find an ifaddr on
an interface that is part of the same bridge.
 1.76 04-Jul-2001  itojun branches: 1.76.2;
better support for multiple IPv4 addresses on a single interface.
- consider non-primary (2nd and beyond) IPv4 address as "local", and prevent
outgoing ARP.
- for routing entries generated by ARP, make sure to set rt->rt_ifa equal to
rt_key, to help IPv4 source address selection for traffic to myself.
PR 13311.

caveats/TODOs:
- interface routes ("connected routes" in cisco terminlogy) is tied with the
primary (1st) IPv4 address on the interface. should be fixed with updates
to rt_ifinit().
- source address selection for offlink locations. 1st address tend to be used
with the current code
(you can configure it right by setting rt->rt_ifa accordingly).
 1.75 11-Jun-2001  tron Make arplookup error messages more informative. Patch supplied by
Andrew Brown in PR kern/13162.
 1.74 14-May-2001  matt Use the LIST_NEXT & LIST_FIRST macros instead of refering to
le_next & lh_first.
 1.73 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.72 26-Jan-2001  is branches: 1.72.2;
Make diagnostic actually useful - needed to debug other ARP PRs.
Suggested by Geoff C. Wing in PR 10815.
 1.71 17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.70 15-Aug-2000  jhawk Add kernel counters for arp events, displayable with netstat -s -f arp
 1.69 20-May-2000  jhawk branches: 1.69.4;
Install "show arptab" (db_show_arptab) in the ddb command tree.
Move prototype from netinet/if_inarp.h to ddb/db_interface.h.
Change function to have standard ddb parameters (though they're
ignored).
 1.68 30-Mar-2000  augustss Remove register declarations.
 1.67 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.66 25-Sep-1999  is branches: 1.66.2;
Decouple IP mtu for ARCnet devices from interface MTU.
This is important, because for most protocols, link level fragmentation is
used, but with different default effective MTUs. (e.g.: IPv4 default MTU
is 1500 octets, IPv6 default MTU is 9072 octets).
 1.65 21-Aug-1999  matt Cleanup a little kludge in mtu handling in route.c. Bring down FDDI
mtu to legal IP max but don't affect other protocols.
 1.64 05-Aug-1999  sommerfeld Delete debug printfs from arp_drain()
 1.63 05-Aug-1999  sommerfeld Implement arp_drain(), which frees packets tied up in the arp cache if
mbufs are in short supply.
Create a (trivial) protocol domain for arp so that the drain routine will
be called from m_reclaim()
 1.62 18-Jun-1999  thorpej When sending an ARP reply, make sure to set the length of the outgoing
packet.

Slightly modified from PR #7809, Zdenek Salvet <salvet@ics.muni.cz>.
 1.61 30-May-1999  bad Fix thinko of mine in previous. The source route info is not at m->m_data
after various m_adj()s have been done. Kludge around this with a cheesy
macro that knows where the drivers put the mac header in the first mbuf.

XXX There should be a better way to do this.
 1.60 29-May-1999  bad Don't assume the Token-Ring source route is in the m_pktdat. Use
m_data instead. This isn't a problem with ARP packets but is correct
way to this.

Noticed by pmara@cactus.org (Shashi Mara).
 1.59 23-May-1999  ad For completeness sake, allow this to compile with no loopback interfaces
configured.
 1.58 04-May-1999  is Fixes PR 7489 by Olaf Seibert. Fix by Zdenek Salvet (PR 7497).
 1.57 04-May-1999  is Fix for PR 7490 by Olaf Seibert, fix mostly from PR 7497 bei Zdenek Salvet,
but with more verbose error messages.
 1.56 22-Mar-1999  bad branches: 1.56.2;
Add support for Token-Ring source routes in the ARP cache.

By Onno van der Linden.
 1.55 21-Feb-1999  drochner -always do an RARP if revarpwhoarewe() is called, it might be for another
interface or the server's configuration has changed
-g/c revarpwhoami()
 1.54 19-Dec-1998  thorpej Reverse the copyright-notice-swap. It went against existing practice.
 1.53 01-Oct-1998  drochner branches: 1.53.4;
print reason for arplookup() failure (ala FreeBSD)
 1.52 30-Sep-1998  tls Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.
 1.51 05-Jul-1998  jonathan defopt INET, NETATALK.
 1.50 04-Jul-1998  jonathan defopt DDB.
 1.49 02-Jul-1998  is Thinko in last fix: we have to actually check each address for a copy on
our ifp, else we might fail for some strange configurations.
 1.48 02-Jul-1998  is The rewrite of if_arp.c to work with the hashed interface address lists
(1.44) missed a test for the right interface, making some machines answer
to some bogus arp requests (like for WHO-HAS 127.0.0.1).

The quick patch in 1.46-1.47 does not work for so-called "unnumbered"
interfaces, that is, (point-to-point) interfaces that share their local
address with another (e.g., the Ethernet) interface.

We add a macro to in_var.h, to step (in the current implementation) through
the hash chain and fine more entries with the same address, and use that
in if_arp.c to find one which belongs to our interface.
 1.47 25-Jun-1998  tls Fix buglet where we might respond to arp on wrong interface.
 1.46 29-May-1998  matt Change arp so its console log messages print out IP addresses in
dotted quad format instead of hex.
 1.45 15-Feb-1998  tls Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.
 1.44 13-Feb-1998  tls Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.
 1.43 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.42 31-Oct-1997  gwr Get rid of the messages: "arp: zero IP addr from ..."
If one really wants to know about those confused PCs
trying to use IP address zero, they can use tcpdump.
 1.41 02-Oct-1997  is branches: 1.41.2;
Reimplement a test for broadcast addresses advertized, which was left out
when rewriting the ARP system.
 1.40 29-Aug-1997  gwr Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)
 1.39 04-Aug-1997  lukem struct loif is an array of NLOOP (from "loop.h") elements
 1.38 27-May-1997  gwr branches: 1.38.4;
Allow revarpwhoami() to be called multiple times.
(Just return the answer if we already have it.)
Without this, the RB_ASKNAME loop fails on every
call to nfs_mountroot after the first call.
 1.37 07-Apr-1997  jtk add newlines at end of debugging log messages which were missing them
 1.36 23-Mar-1997  is Fix several bugs related to the new ARP code, and ARCnet ARP support.
Among other, add ARPHRD_ARCNET definition, make sure the hardware type is
set on outgoing ARP packets, make sure we dont send out replies as broadcasts.
 1.35 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.34 13-Oct-1996  christos branches: 1.34.4;
backout previous kprintf changes
 1.33 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.32 09-Oct-1996  thorpej Merge netbsd-1-2 branch back into mainline.
 1.31 11-May-1996  mycroft branches: 1.31.4;
When sending an ARP request, use the interface address for the route, rather
than the first address assigned. This gives slightly different behaviour in
the presence of aliases. From Bill Fenner, via Pete Bentley.
 1.30 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.29 30-Mar-1996  christos Fix db_printf formats
 1.28 13-Feb-1996  christos netinet prototypes
 1.27 12-Aug-1995  mycroft splnet --> splsoftnet
 1.26 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.25 04-Jun-1995  mycroft Clean up many more casts.
 1.24 15-May-1995  cgd spacing fixups and KNF. #define ether address size, so it's not
hardcoded as '6' all over.
 1.23 17-Apr-1995  cgd spacing cleaup. also, minor type mixup fixups.
 1.22 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.21 11-Apr-1995  mycroft Remove some explicit references to loif.
 1.20 07-Apr-1995  mycroft Add a common function to initialize ARP-related variables. `Insired'
by Garrett Wollman.
 1.19 06-Mar-1995  glass remove references to arptnew. fix spelling error
 1.18 27-Jul-1994  mycroft Fix byte-order bug in printf() statement.
 1.17 24-Jul-1994  cgd kill conflicting externs
 1.16 29-Jun-1994  cgd branches: 1.16.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.15 21-Jun-1994  chopps config.new hack for if_ether.c for lack of an `and' in the grammer
and protect some ether specific code in in.c
 1.14 04-Jun-1994  gwr Back out some of my changes which Keith Sklower convinced me are
unnecessary. Leaving in just the essentials of the fix.
 1.13 03-Jun-1994  gwr Avoid accidentaly creating permanent entries at time==0
Routes created with RTM_ADD (i.e. manually added) are
permanent so leave their expiration time set to zero.
 1.12 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.11 29-Apr-1994  cgd change timeout/untimeout/wakeup/sleep/tsleep args to void *
 1.10 18-Apr-1994  mycroft Dummy arpintr() for now.
 1.9 18-Apr-1994  glass revised nfs diskless support. uses bootp+rpc to gather parameters
 1.8 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.7 23-Jan-1994  deraadt ether_output() & ether_input() take ether_type as a net-short.
AF_UNSPEC does not swap byte order of ether_type.
NOTE: this requires driver changes
 1.6 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.5 18-Dec-1993  mycroft Canonicalize all #includes.
 1.4 12-Dec-1993  hpeyerl >From cmaeda@cs.washington.edu; part of the multicast patches derived
from the Multicast patches for BSDI.

Thanx to Brad Parker for making me realize i'd forgotten to commit
this patch..(color me dopey)
 1.3 27-Jun-1993  andrew branches: 1.3.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.2 20-May-1993  cgd more rcsid additions and file header cleanups
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.16.2.2 28-Jul-1994  cgd from trunk.
 1.16.2.1 24-Jul-1994  cgd from trunk.
 1.31.4.2 31-Aug-1996  thorpej Delete these from the NetBSD 1.2 release branch; they were tagged in error.
 1.31.4.1 14-Jun-1996  is Not needed for the netbsd-1-2 release
 1.34.4.1 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.38.4.3 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.38.4.2 01-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.38.4.1 23-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.41.2.2 01-Oct-1998  cgd pull up revisions 1.43-1.45, 1.47-1.49, 1.52 from trunk. (tls)
 1.41.2.1 31-Oct-1997  mellon Pull rev 1.42 up from trunk (gwr)
 1.53.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.56.2.4 02-Jul-2000  he Apply patch (partially revision 1.63, requested by sommerfeld):
Protect arp table handling with splnet() to avoid interrupt races
when ip_flow is in use. Fixes PR#10351.
 1.56.2.3 20-Jun-1999  perry pullup 1.61->1.62 (thorpej)
 1.56.2.2 04-May-1999  perry branches: 1.56.2.2.2; 1.56.2.2.4;
pullup 1.57->1.58 (is)
 1.56.2.1 04-May-1999  perry pullup 1.56->1.57 (is)
 1.56.2.2.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.56.2.2.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.66.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.66.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.66.2.2 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.66.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.69.4.1 01-Oct-2003  msaitoh Pull up revision 1.94 via patch (requested by itojun in ticket #86):
On arplookup() failure, nuke cloned route - otherwise outsider could use
massive number of bogus ARPs for DoS attack. FreeBSD-SA-03:14.arp
 1.72.2.7 11-Dec-2002  thorpej Sync with HEAD.
 1.72.2.6 11-Nov-2002  nathanw Catch up to -current
 1.72.2.5 01-Aug-2002  nathanw Catch up to -current.
 1.72.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.72.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.72.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.72.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.76.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.76.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.76.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.76.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.79.10.1 01-Oct-2003  tron Pull up revision 1.94 via patch (requested by itojun in ticket #1482):
on arplookup() failure, nuke cloned route - otherwise outsider could use massive
number of bogus ARPs for DoS attack. FreeBSD-SA-03:14.arp
 1.79.8.2 15-Jul-2002  gehenna catch up with -current.
 1.79.8.1 20-Jun-2002  gehenna catch up with -current.
 1.92.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.92.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.92.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.92.2.6 24-Jan-2005  skrll Sync with HEAD.
 1.92.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.92.2.4 19-Oct-2004  skrll Sync with HEAD
 1.92.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.92.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.92.2.1 03-Aug-2004  skrll Sync with HEAD
 1.100.4.1 29-Apr-2005  kent sync with -current
 1.101.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.101.2.1 12-Feb-2005  yamt sync with head.
 1.103.6.1 29-Aug-2008  bouyer Pull up following revision(s) (requested by uebayasi in ticket #1956):
sys/netinet/if_arp.c: revision 1.141
Missing "\n" in log(9) messages.
 1.103.4.1 29-Aug-2008  bouyer Pull up following revision(s) (requested by uebayasi in ticket #1956):
sys/netinet/if_arp.c: revision 1.141
Missing "\n" in log(9) messages.
 1.103.2.1 29-Aug-2008  bouyer Pull up following revision(s) (requested by uebayasi in ticket #1956):
sys/netinet/if_arp.c: revision 1.141
Missing "\n" in log(9) messages.
 1.106.2.6 21-Jan-2008  yamt sync with head
 1.106.2.5 15-Nov-2007  yamt sync with head.
 1.106.2.4 03-Sep-2007  yamt sync with head.
 1.106.2.3 26-Feb-2007  yamt sync with head.
 1.106.2.2 30-Dec-2006  yamt sync with head.
 1.106.2.1 21-Jun-2006  yamt sync with head.
 1.108.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.108.8.3 03-Sep-2006  yamt sync with head.
 1.108.8.2 26-Jun-2006  yamt sync with head.
 1.108.8.1 24-May-2006  yamt sync with head.
 1.108.6.2 01-Jun-2006  kardel Sync with head.
 1.108.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.108.4.1 09-Sep-2006  rpaulo sync with head
 1.110.2.1 19-Jun-2006  chap Sync with head.
 1.113.4.2 10-Dec-2006  yamt sync with head.
 1.113.4.1 22-Oct-2006  yamt sync with head
 1.113.2.2 12-Jan-2007  ad Sync with head.
 1.113.2.1 18-Nov-2006  ad Sync with head.
 1.117.8.1 29-Aug-2008  bouyer Pull up following revision(s) (requested by uebayasi in ticket #1191):
sys/netinet/if_arp.c: revision 1.141
Missing "\n" in log(9) messages.
 1.117.4.1 04-Sep-2008  skrll Sync with netbsd-4.
 1.117.2.1 29-Aug-2008  bouyer Pull up following revision(s) (requested by uebayasi in ticket #1191):
sys/netinet/if_arp.c: revision 1.141
Missing "\n" in log(9) messages.
 1.118.2.3 12-Mar-2007  rmind Sync with HEAD.
 1.118.2.2 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.118.2.1 17-Feb-2007  yamt file if_arp.c was added on branch yamt-idlelwp on 2007-02-27 16:54:52 +0000
 1.121.6.1 09-Dec-2007  reinoud Pullup to HEAD
 1.121.4.1 11-Jul-2007  mjf Sync with head.
 1.121.2.4 09-Oct-2007  ad Sync with head.
 1.121.2.3 20-Aug-2007  ad Sync with HEAD.
 1.121.2.2 15-Jul-2007  ad Sync with head.
 1.121.2.1 01-Jul-2007  ad Adapt to callout API change.
 1.124.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.124.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.125.6.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.125.6.1 19-Jul-2007  dyoung file if_arp.c was added on branch matt-mips64 on 2007-07-19 20:48:54 +0000
 1.125.4.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.125.4.2 14-Nov-2007  joerg Sync with HEAD.
 1.125.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.126.2.3 23-Mar-2008  matt sync with HEAD
 1.126.2.2 09-Jan-2008  matt sync with HEAD
 1.126.2.1 06-Nov-2007  matt sync with HEAD
 1.128.6.3 18-Feb-2008  mjf Sync with HEAD.
 1.128.6.2 08-Dec-2007  mjf Sync with HEAD.
 1.128.6.1 19-Nov-2007  mjf Sync with HEAD.
 1.128.4.1 18-Nov-2007  bouyer Sync with HEAD
 1.129.2.1 08-Dec-2007  ad Sync with head.
 1.130.4.1 23-Jan-2008  bouyer Sync with HEAD.
 1.131.8.4 04-Jan-2009  christos fix time_t format
 1.131.8.3 09-Nov-2008  christos merge with head.
 1.131.8.2 01-Nov-2008  christos Sync with head.
 1.131.8.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.131.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.131.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.131.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.132.2.1 18-May-2008  yamt sync with head.
 1.134.2.4 11-Mar-2010  yamt sync with head
 1.134.2.3 19-Aug-2009  yamt sync with head.
 1.134.2.2 04-May-2009  yamt sync with head.
 1.134.2.1 16-May-2008  yamt sync with head.
 1.137.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.137.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.140.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.140.4.1 19-Oct-2008  haad Sync with HEAD.
 1.143.8.2 11-May-2010  matt Fix printf for u_quad_t route changes.
 1.143.8.1 21-Apr-2010  matt sync to netbsd-5
 1.143.6.1 21-Nov-2009  snj Pull up following revision(s) (requested by christos in ticket #1156):
sys/net/if_arcsubr.c: revision 1.61
sys/net/if_ethersubr.c: revision 1.173
sys/net/if_fddisubr.c: revision 1.78
sys/net/if_tokensubr.c: revision 1.58 via patch
sys/netinet/if_arp.c: revision 1.149
ar_tha() can return NULL; treat this as an error.
 1.143.4.2 21-Nov-2009  snj Pull up following revision(s) (requested by christos in ticket #1156):
sys/net/if_arcsubr.c: revision 1.61
sys/net/if_ethersubr.c: revision 1.173
sys/net/if_fddisubr.c: revision 1.78
sys/net/if_tokensubr.c: revision 1.58 via patch
sys/netinet/if_arp.c: revision 1.149
ar_tha() can return NULL; treat this as an error.
 1.143.4.1 05-Sep-2009  bouyer Pull up following revision(s) (requested by dyoung in ticket #911):
sys/netinet/if_arp.c: revision 1.146
Don't require the gateway address to have room for both an interface
name and address. Room for an address will do. This should fix
a regression in 'arp -s ...' on interfaces such as xennet0 with
unusually long names. Fix PR #41878.
 1.143.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.149.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.149.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.149.4.2 31-May-2011  rmind sync with head
 1.149.4.1 05-Mar-2011  rmind sync with head
 1.152.6.1 18-Feb-2012  mrg merge to -current.
 1.152.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.152.2.1 17-Apr-2012  yamt sync with head
 1.154.16.2 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1432):
sys/netinet/if_arp.c: 1.238, 1.239 via patch
Make sure the protocol address length equals that of IPv4. Also, make sure
the hardware address length equals that of the interface we received the
packet on. Otherwise a packet could easily set them both to zero and make
the kernel read beyond the allocated mbuf, which is terrible.
Note: for the latter we drop the packet instead of replying, since it is
malformed.
Note: I also added an ugly hack in CARP, since it apparently expects at
least six bytes.
--
Add some checks, mostly same as in_arpinput.
 1.154.16.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.154.10.2 18-May-2014  rmind sync with head
 1.154.10.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.154.8.2 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1432):
sys/netinet/if_arp.c: 1.238, 1.239 via patch
Make sure the protocol address length equals that of IPv4. Also, make sure
the hardware address length equals that of the interface we received the
packet on. Otherwise a packet could easily set them both to zero and make
the kernel read beyond the allocated mbuf, which is terrible.
Note: for the latter we drop the packet instead of replying, since it is
malformed.
Note: I also added an ugly hack in CARP, since it apparently expects at
least six bytes.
--
Add some checks, mostly same as in_arpinput.
 1.154.8.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.154.6.2 03-Dec-2017  jdolecek update from HEAD
 1.154.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.154.2.3 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1432):
sys/netinet/if_arp.c: 1.238, 1.239 via patch
Make sure the protocol address length equals that of IPv4. Also, make sure
the hardware address length equals that of the interface we received the
packet on. Otherwise a packet could easily set them both to zero and make
the kernel read beyond the allocated mbuf, which is terrible.
Note: for the latter we drop the packet instead of replying, since it is
malformed.
Note: I also added an ugly hack in CARP, since it apparently expects at
least six bytes.
--
Add some checks, mostly same as in_arpinput.
 1.154.2.2 15-Nov-2015  bouyer Pull up following revision(s) (requested by ozaki-r in ticket #1328):
sys/netinet/if_arp.c: revision 1.160
Add sysctl to selectively log arp packets from unknown network. (Adrien URBAN).
 1.154.2.1 03-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.155.2.1 10-Aug-2014  tls Rebase.
 1.158.4.1 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1356):
sys/netinet/if_arp.c: revision 1.238, 1.239 via patch
Make sure the protocol address length equals that of IPv4. Also, make sure
the hardware address length equals that of the interface we received the
packet on. Otherwise a packet could easily set them both to zero and make
the kernel read beyond the allocated mbuf, which is terrible.
Note: for the latter we drop the packet instead of replying, since it is
malformed.
Note: I also added an ugly hack in CARP, since it apparently expects at
least six bytes.
--
Add some checks, mostly same as in_arpinput.
 1.158.2.2 05-Feb-2017  snj Pull up following revision(s) (requested by maxv in ticket #1356):
sys/netinet/if_arp.c: revision 1.238, 1.239 via patch
Make sure the protocol address length equals that of IPv4. Also, make sure
the hardware address length equals that of the interface we received the
packet on. Otherwise a packet could easily set them both to zero and make
the kernel read beyond the allocated mbuf, which is terrible.
Note: for the latter we drop the packet instead of replying, since it is
malformed.
Note: I also added an ugly hack in CARP, since it apparently expects at
least six bytes.
--
Add some checks, mostly same as in_arpinput.
 1.158.2.1 06-Nov-2015  riz branches: 1.158.2.1.2;
Pull up following revision(s) (requested by ozaki-r in ticket #985):
sys/netinet/if_arp.c: revision 1.160
Add sysctl to selectively log arp packets from unknown network. (Adrien URBAN).
 1.158.2.1.2.1 13-Mar-2017  skrll Sync with netbsd-7-1-RELEASE
 1.160.2.12 28-Aug-2017  skrll Sync with HEAD
 1.160.2.11 05-Feb-2017  skrll Sync with HEAD
 1.160.2.10 05-Dec-2016  skrll Sync with HEAD
 1.160.2.9 05-Oct-2016  skrll Sync with HEAD
 1.160.2.8 09-Jul-2016  skrll Sync with HEAD
 1.160.2.7 29-May-2016  skrll Sync with HEAD
 1.160.2.6 22-Apr-2016  skrll Sync with HEAD
 1.160.2.5 19-Mar-2016  skrll Sync with HEAD
 1.160.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.160.2.3 22-Sep-2015  skrll Sync with HEAD
 1.160.2.2 06-Jun-2015  skrll Sync with HEAD
 1.160.2.1 06-Apr-2015  skrll Sync with HEAD
 1.217.2.6 26-Apr-2017  pgoyette Sync with HEAD
 1.217.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.217.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.217.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.217.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.217.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.233.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.248.4.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.250.2.10 30-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1396):

sys/netinet6/nd6.h: revision 1.88
sys/netinet6/nd6_nbr.c: revision 1.174
sys/netinet6/nd6.c: revision 1.264
sys/netinet/if_arp.c: revision 1.288 (patch)

Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.
 1.250.2.9 06-Nov-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #1080):

sys/netinet6/nd6.c: revision 1.251
sys/netinet/if_arp.c: revision 1.276
sys/net/if.c: revision 1.438
sys/net/if.c: revision 1.439
sys/net/route.c: revision 1.214
sys/net/route.c: revision 1.215
sys/net/route.c: revision 1.216
sys/netinet6/in6.c: revision 1.270
sys/net/route.h: revision 1.120
sys/net/if.c: revision 1.440

Remove a wrong assertion in ifaref

-

Doing ifref on an ifa with IFA_DESTROYING is not a problem; the reference should
be dropped during the destruction of the ifa.

-

Use atomic operations for ifa_refcnt

-

Avoid a dangling pointer during rt_replace_ifa

-

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.

-

Use rt_update framework on updating a rtentry
 1.250.2.8 02-Apr-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #686):

sys/netinet/if_arp.c: revision 1.271
sys/netinet6/nd6_nbr.c: revision 1.151,1.152

Avoid passing NULL to nd6_dad_duplicated
Fix PR kern/53075

Fix a race condition on DAD destructions (again)

The previous fix to DAD timers was wrong; it avoided a use-after-free but
instead introduced a memory leak. The destruction method had delegated
a destruction of a DAD timer to the timer itself and told that by setting NULL
to dp->dad_ifa. However, the previous fix made DAD timers do nothing on
the sign.

Fixing the issue with using callout_stop isn't easy. One approach is to have
a refcount on dp but it introduces extra complexity that we want to avoid.
The new fix falls back to using callout_halt, which was abandoned because of
softnet_lock. Fortunately now the network stack is protected by KERNEL_LOCK
so we can remove softnet_lock from DAD timers (callout) and use callout_halt
safely.
 1.250.2.7 13-Mar-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #622):
sys/netinet/if_arp.c: revision 1.270
sys/net/if_llatbl.c: revision 1.24 (patch)
sys/net/if_llatbl.c: revision 1.25
sys/net/if_llatbl.c: revision 1.26
sys/net/route.c: revision 1.204
sys/netinet6/in6.c: revision 1.261
sys/netinet6/in6.c: revision 1.262 (patch)
sys/netinet6/in6.c: revision 1.263
sys/netinet/in.c: revision 1.216
sys/netinet6/in6.c: revision 1.264
sys/netinet6/nd6.c: revision 1.246 (patch)
sys/netinet/if_arp.c: revision 1.269
sys/net/if_llatbl.h: revision 1.14
sys/netinet6/in6.c: revision 1.259
sys/netinet/in.c: revision 1.220
sys/netinet/in.c: revision 1.221 (patch)
sys/netinet/in.c: revision 1.222
sys/netinet/in.c: revision 1.223

Suppress noisy debugging outputs
Even if DEBUG they are too noisy under load.

Tweak sanity checks

Scheduling a timer of static entries is wrong.

Add assertions

We must not destroy llentries holding mbufs.

Fix reference leaks of llentry
callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).
While here, we can remove remaining abuses of mutex_owned for softnet_lock.

Fix memory leaks on arp -d and ndp -d for static entries
We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.

Use pool(9) for llentry allocations
llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.250.2.6 26-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #589):
sys/netinet/if_arp.c: revision 1.267
sys/netinet6/nd6_nbr.c: revision 1.146-1.148

Use KASSERT for checking a programming error

Simplify; pass dp to nd6_dad_duplicated instead of looking it up again in it

Avoid a race condition of DAD timer destructions

When we see dp->dad_ifa == NULL, it means that the ifa is being deleted and also
the callout is scheduled again by someone. We shouldn't rely on a result of
callout_pending to know if the callout is scheduled because it returns false if
the subsequent callout handler is already on the fly.
We have to always delegate the destruction of dp to the subsequent handler
unconditionally if dp->dad_ifa == NULL. Otherwise, the first handler destroys
the dp and the second handler tries to handle destroyed dp.
 1.250.2.5 26-Jan-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #511):
sys/kern/kern_timeout.c: revision 1.54
sys/netinet6/nd6_nbr.c: revision 1.141
sys/netinet6/nd6_nbr.c: revision 1.144
sys/netinet/if_arp.c: revision 1.256
Fix a deadlock on callout_halt of nd6_dad_timer
We must not call callout_halt of nd6_dad_timer with holding nd6_dad_lock because
the lock is taken in nd6_dad_timer. Once softnet_lock goes away, we can pass the
lock to callout_halt, but for now we cannot.
Make DAD destructions (MP-)safe with callout_stop
arp_dad_stoptimer and nd6_dad_stoptimer can be called with or without
softnet_lock held and unfortunately we have no easy way to statically know which.
So it is hard to use callout_halt there.
To address the situation, we use callout_stop to make the code safe. The new
approach copes with the issue by delegating the destruction of a callout to
callout itself, which allows us to not wait the callout to finish. This can be
done thanks to that DAD objects are separated from other data such as ifa.
The approach is suggested by riastradh@
Proposed on tech-kern@ and tech-net@
Sanity-check if interlock is held when it's passed
 1.250.2.4 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.250.2.3 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #353):
sys/net/if_llatbl.c: 1.22
sys/net/if_llatbl.h: 1.13
sys/netinet/if_arp.c: 1.254
sys/netinet/in.c: 1.208-1.209
sys/netinet6/in6.c: 1.249-1.250
sys/netinet6/nd6.c: 1.237
Remove redundant KASSERTMSG
The function is static, has just one caller and the caller does the same check.
--
Fix a deadlock between a route update and lltable
It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.
If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.
A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update
Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.
Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.250.2.2 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.250.2.1 01-Jul-2017  snj Pull up following revision(s) (requested by roy in ticket #77):
sys/net/if.h: revision 1.240
sys/netinet/if_arp.c: revision 1.253
sys/net/if.c: revision 1.395
Introduce if_get_bylla to find an interface with the active
local link address.
--
Use if_get_bylla() instead of just looking at the lla of the interface
the address belongs to.
This allows any ARP message we receieved from another interface to
be correctly dropped.
While here, move the protocol length check higher up the food chain.
 1.268.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.268.2.5 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.268.2.4 21-May-2018  pgoyette Sync with HEAD
 1.268.2.3 02-May-2018  pgoyette Synch with HEAD
 1.268.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.268.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.275.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.275.2.1 10-Jun-2019  christos Sync with HEAD
 1.282.2.7 24-Aug-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #1883):

tests/net/arp/t_dad.sh: revision 1.16
sys/netinet/in.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.46
sys/netinet/if_arp.c: revision 1.314

arp: fix the behavior on detecting an address duplication without IPv4 DAD

On receiving an ARP request that has the same source protocol address as
the own address, i.e., address duplication, the original behavior of
a kernel prior to supporing IPv4 DAD is to send an ARP reply. It is
the same with a latest kernel with DAD enabled. However, a latest
kernel without DAD sends back an GARP packet. Restore the original
behavior.

inet: send GARP on link up if DAD is disabled

This behavior was accidentally removed at rev 1.233.

tests, arp: add tests of address duplications without DAD

tests, arp: add tests for GARP on link up
 1.282.2.6 24-Jan-2020  martin Pull up following revision(s) (requested by roy in ticket #645):

sys/netinet/if_arp.c: revision 1.292

arp: find source address then target address when processing input

This fixes the case where another host having a duplicate ip address
starts using it right away without probing for it's availability.

While here, prefer ifatoia over a strict cast.
 1.282.2.5 11-Oct-2019  martin Pull up following revision(s) (requested by roy in ticket #300):

sys/netinet/if_arp.c: revision 1.289

ARP: Don't defend ARP probes.

We should let the nature of ARP takes it's course here when our address
is neither tentative nor duplicated.
This allows the host to work with ARP ping, which was broken in r1.279.
 1.282.2.4 30-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #269):

sys/netinet6/nd6.h: revision 1.88
sys/net/rtsock_shared.c: revision 1.10
sys/netinet6/nd6_nbr.c: revision 1.174
sys/netinet6/nd6.c: revision 1.264
sys/netinet/if_arp.c: revision 1.283
sys/netinet/if_arp.c: revision 1.288

Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.

-

Initialize dom_mowner for MBUFTRACE
 1.282.2.3 05-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #170):

sys/netinet/if_arp.c: revision 1.287

inet: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.

This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This tells us when a new lladdr has been added (RTM_ADD),
changed (RTM_CHANGE), deleted (RTM_DELETED) or has failed to been
resolved (RTM_MISS). The latter case can be interpreted as unreachable.
 1.282.2.2 01-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #147):

sys/netinet/if_arp.c: revision 1.285
sys/netinet/if_arp.c: revision 1.286

ARP: remove unused sysctl entry log_unknown_network

ARP: change default sysctl entry log_movements to 0
IP address sharing is a thing and shouldn't cause needless diagnostics
by default.
 1.282.2.1 26-Aug-2019  martin Pull up following revision(s) (requested by roy in ticket #109):

sys/net/route.h: revision 1.124
sys/netinet6/nd6.c: revision 1.258
sys/netinet6/nd6.c: revision 1.259
sys/net/rtsock.c: revision 1.251
sys/netinet/if_arp.c: revision 1.284
sys/netinet6/nd6_nbr.c: revision 1.167

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9

-

nd6: notify userland of neighbour lla updates once more

XXX pullup -8 -9
 1.289.2.1 25-Jan-2020  ad Sync with head.
 1.297.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.311.2.3 13-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #859):

tests/net/arp/t_arp.sh: revision 1.47
tests/net/arp/t_arp.sh: revision 1.48
sys/netinet/if_arp.c: revision 1.315

arp: allow to send packets without an ARP resolution just after
receiving an ARP request

On receiving an ARP request, the current implemention creates an ARP
cache entry but with ND_LLINFO_NOSTATE. Such an entry still needs
an ARP resolution to send back a packet to the requester. The original
behavior before introducing the common ND framework didn't need the
resolution. IPv6 doesn't as well. To restore the original behavior,
make a new ARP cache entry with ND_LLINFO_STALE like IPv6 does.

tests: dedup t_arp.sh like others (NFC)

tests: add tests for ARP cache entry creations
 1.311.2.2 24-Aug-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #812):

tests/net/arp/t_dad.sh: revision 1.16
sys/netinet/in.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.46
sys/netinet/if_arp.c: revision 1.314

arp: fix the behavior on detecting an address duplication without IPv4 DAD

On receiving an ARP request that has the same source protocol address as
the own address, i.e., address duplication, the original behavior of
a kernel prior to supporing IPv4 DAD is to send an ARP reply. It is
the same with a latest kernel with DAD enabled. However, a latest
kernel without DAD sends back an GARP packet. Restore the original
behavior.

inet: send GARP on link up if DAD is disabled

This behavior was accidentally removed at rev 1.233.

tests, arp: add tests of address duplications without DAD

tests, arp: add tests for GARP on link up
 1.311.2.1 10-Mar-2024  martin Pull up following revision(s) (requested by riastradh in ticket #618):

sys/netinet/if_arp.c: revision 1.312

Attribute debug message.
Fixes PR 57959
 1.313.2.1 02-Aug-2025  perseant Sync with HEAD
 1.25 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.24 15-Jan-1997  gwr branches: 1.24.2;
sync with if_ether.h
 1.23 09-Oct-1996  thorpej branches: 1.23.2;
Merge netbsd-1-2 branch back into mainline.
 1.22 11-May-1996  mycroft branches: 1.22.4;
When sending an ARP request, use the interface address for the route, rather
than the first address assigned. This gives slightly different behaviour in
the presence of aliases. From Bill Fenner, via Pete Bentley.
 1.21 13-Feb-1996  christos netinet prototypes
 1.20 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.19 16-May-1995  cgd parenthesize macro arg usage
 1.18 15-May-1995  cgd spacing fixups and KNF. #define ether address size, so it's not
hardcoded as '6' all over.
 1.17 17-Apr-1995  cgd spacing cleaup. also, minor type mixup fixups.
 1.16 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.15 10-Apr-1995  mycroft Remove now unneeded #ifdef. Prototype new function.
 1.14 29-Mar-1995  briggs KERNEL -> _KERNEL
 1.13 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.12 06-Mar-1995  glass remove references to arptnew. fix spelling error
 1.11 27-Feb-1995  glass fix some typos. from frank@fwi.uva.nl (Frank van der Linden)
 1.10 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.7 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.6 30-Dec-1993  deraadt "struct ether_addr" for ethers(3) functions.
 1.5 13-Dec-1993  hpeyerl >From cmaeda@cs.washington.edu; part of the multicast patches derived
from the Multicast patches for BSDI.

(I am a "big dopey bear" for having forgotten this. Thanx Havard.)
 1.4 05-Sep-1993  cassidy Add definition for reverse address resolution protocol.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.22.4.2 31-Aug-1996  thorpej Delete these from the NetBSD 1.2 release branch; they were tagged in error.
 1.22.4.1 14-Jun-1996  is Not needed for the netbsd-1-2 release
 1.23.2.1 18-Jan-1997  thorpej Update from trunk.
 1.24.2.1 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.40 06-Sep-2018  maxv Remove the network ATM code.
 1.39 12-Dec-2016  ozaki-r branches: 1.39.14; 1.39.16;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.38 28-Apr-2016  ozaki-r branches: 1.38.2;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.37 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.36 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.35 17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.34 10-Nov-2014  maxv branches: 1.34.2;
Do not uselessly include <sys/malloc.h>.
 1.33 24-Sep-2012  msaitoh branches: 1.33.12;
Add missing "\n" in log(9)
 1.32 01-Feb-2011  chuck branches: 1.32.4; 1.32.10; 1.32.14;
udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.31 18-Apr-2009  tsutsui branches: 1.31.4; 1.31.6; 1.31.8;
Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.30 18-Mar-2009  cegger bcopy -> memcpy
 1.29 24-Oct-2008  dyoung branches: 1.29.2; 1.29.8;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.28 22-May-2008  dyoung branches: 1.28.4;
Don't cast to void * unnecessarily.
 1.27 05-Sep-2007  dyoung branches: 1.27.20; 1.27.22; 1.27.24; 1.27.26;
We cannot sleep in a software interrupt, so do not sockaddr_dl_alloc(...,
M_WAITOK). Instead, sockaddr_dl_init() a sockaddr_dl on the stack.
 1.26 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.25 07-Aug-2007  dyoung branches: 1.25.2; 1.25.4;
Use satocsdl() et cetera instead of SDL(). Constify.
 1.24 19-Jul-2007  dyoung branches: 1.24.4;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.23 04-Mar-2007  christos branches: 1.23.2; 1.23.10;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.22 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.21 16-Nov-2006  christos branches: 1.21.4;
__unused removal on arguments; approved by core.
 1.20 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.19 30-Aug-2006  christos branches: 1.19.2; 1.19.4;
fix initializers and add const.
 1.18 11-Dec-2005  christos branches: 1.18.4; 1.18.8;
merge ktrace-lwp.
 1.17 02-Feb-2005  perry branches: 1.17.6;
de-__P, do some ANSIfication.
 1.16 11-Sep-2002  itojun branches: 1.16.6; 1.16.14; 1.16.16;
KNF - return is not a function. sync w/kame.
 1.15 09-Jun-2002  itojun whitespace
 1.14 13-Nov-2001  lukem branches: 1.14.8;
add RCSIDs
 1.13 17-Jan-2001  itojun branches: 1.13.2; 1.13.4;
pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.12 30-Mar-2000  augustss Remove register declarations.
 1.11 01-Jul-1999  itojun branches: 1.11.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.10 13-Sep-1998  christos branches: 1.10.8; 1.10.10;
Fix copyright spacing and 'Van' -> 'van' for consistency.
 1.9 05-Jul-1998  jonathan defopt NATM.
 1.8 05-Jul-1998  jonathan defopt INET, NETATALK.
 1.7 29-Apr-1998  thorpej Fix some whitespace.
 1.6 13-Oct-1996  christos backout previous kprintf changes
 1.5 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.4 03-Jul-1996  chuck ported ATM to FreeBSD 2.2-960612-SNAP
 1.3 29-Jun-1996  chuck change:
- change asock to rxhand and adjust all for this [esp atm_input]
 1.2 26-Jun-1996  chuck fixes/new stuff:
[1] if user tries to enter in a bogus PVC don't leave it in the routing
table ... remove it
[2] change ioctl arg to include rxso for lower layer
[3] add hooks (inside "NATM" ifdef) for native mode atm sockets so that
they don't clash with IP PVCs. [i am still debugging the native
mode atm socket protosw code]
 1.1 22-Jun-1996  chuck network support for ATM networks (ATM == Async Transfer Mode, not
Automatic Teller Machine).

Currently supports PVCs only (no ATM ARP either).
 1.10.10.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.10.10.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.10.8.1 01-Jul-1999  thorpej Sync w/ -current.
 1.11.2.2 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.11.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.4.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.13.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.13.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.13.2.3 17-Sep-2002  nathanw Catch up to -current.
 1.13.2.2 20-Jun-2002  nathanw Catch up to -current.
 1.13.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.14.8.1 20-Jun-2002  gehenna catch up with -current.
 1.16.16.1 12-Feb-2005  yamt sync with head.
 1.16.14.1 29-Apr-2005  kent sync with -current
 1.16.6.1 04-Feb-2005  skrll Sync with HEAD.
 1.17.6.4 27-Oct-2007  yamt sync with head.
 1.17.6.3 03-Sep-2007  yamt sync with head.
 1.17.6.2 26-Feb-2007  yamt sync with head.
 1.17.6.1 30-Dec-2006  yamt sync with head.
 1.18.8.1 03-Sep-2006  yamt sync with head.
 1.18.4.1 09-Sep-2006  rpaulo sync with head
 1.19.4.2 10-Dec-2006  yamt sync with head.
 1.19.4.1 22-Oct-2006  yamt sync with head
 1.19.2.1 18-Nov-2006  ad Sync with head.
 1.21.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.21.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.23.10.3 10-Sep-2007  skrll Sync with HEAD.
 1.23.10.2 03-Sep-2007  skrll Sync with HEAD.
 1.23.10.1 15-Aug-2007  skrll Sync with HEAD.
 1.23.2.2 09-Oct-2007  ad Sync with head.
 1.23.2.1 20-Aug-2007  ad Sync with HEAD.
 1.24.4.3 02-Oct-2007  joerg Sync with HEAD.
 1.24.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.24.4.1 09-Aug-2007  jmcneill Sync with HEAD.
 1.25.4.2 07-Aug-2007  dyoung Use satocsdl() et cetera instead of SDL(). Constify.
 1.25.4.1 07-Aug-2007  dyoung file if_atm.c was added on branch matt-mips64 on 2007-08-07 04:37:05 +0000
 1.25.2.1 06-Nov-2007  matt sync with HEAD
 1.27.26.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.27.24.1 04-May-2009  yamt sync with head.
 1.27.22.1 04-Jun-2008  yamt sync with head
 1.27.20.2 17-Jan-2009  mjf Sync with HEAD.
 1.27.20.1 02-Jun-2008  mjf Sync with HEAD.
 1.28.4.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.29.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.29.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.31.8.1 08-Feb-2011  bouyer Sync with HEAD
 1.31.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.31.4.1 05-Mar-2011  rmind sync with head
 1.32.14.2 03-Dec-2017  jdolecek update from HEAD
 1.32.14.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.32.10.1 23-Oct-2012  riz Pull up following revision(s) (requested by msaitoh in ticket #616):
sys/netinet/if_atm.c: revision 1.33
sys/net/if_arcsubr.c: revision 1.64
sys/netinet/ip_mroute.c: revision 1.126
Add missing "\n" in log(9)
 1.32.4.1 30-Oct-2012  yamt sync with head
 1.33.12.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.34.2.4 05-Feb-2017  skrll Sync with HEAD
 1.34.2.3 29-May-2016  skrll Sync with HEAD
 1.34.2.2 22-Apr-2016  skrll Sync with HEAD
 1.34.2.1 22-Sep-2015  skrll Sync with HEAD
 1.38.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.39.16.1 10-Jun-2019  christos Sync with HEAD
 1.39.14.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.14 06-Sep-2018  maxv Remove the network ATM code.
 1.13 28-Apr-2016  ozaki-r branches: 1.13.16; 1.13.18;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.12 01-Feb-2011  chuck branches: 1.12.14; 1.12.32;
udpate license clauses on my code to match the new-style BSD licenses.
based on diff that rmind@ sent me.

no functional change with this commit.
 1.11 24-Oct-2008  dyoung branches: 1.11.16; 1.11.22; 1.11.24;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.10 17-Feb-2007  dyoung branches: 1.10.38; 1.10.42; 1.10.48;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.9 10-Dec-2005  elad branches: 1.9.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.8 21-Apr-2004  itojun branches: 1.8.12;
no space between function name and paren: foo (blah) -> foo(blah)
 1.7 18-Apr-2004  matt De __P()
 1.6 09-Jun-2002  itojun branches: 1.6.6;
whitespace
 1.5 17-Jan-2001  itojun branches: 1.5.2; 1.5.4; 1.5.16;
pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.4 13-Sep-1998  christos branches: 1.4.12;
Fix copyright spacing and 'Van' -> 'van' for consistency.
 1.3 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.2 03-Jul-1996  chuck ported ATM to FreeBSD 2.2-960612-SNAP
 1.1 22-Jun-1996  chuck network support for ATM networks (ATM == Async Transfer Mode, not
Automatic Teller Machine).

Currently supports PVCs only (no ATM ARP either).
 1.4.12.1 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.5.16.1 20-Jun-2002  gehenna catch up with -current.
 1.5.4.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.5.2.1 20-Jun-2002  nathanw Catch up to -current.
 1.6.6.4 11-Dec-2005  christos Sync with head.
 1.6.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.6.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.6.6.1 03-Aug-2004  skrll Sync with HEAD
 1.8.12.2 26-Feb-2007  yamt sync with head.
 1.8.12.1 21-Jun-2006  yamt sync with head.
 1.9.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.10.48.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.10.42.1 04-May-2009  yamt sync with head.
 1.10.38.1 17-Jan-2009  mjf Sync with HEAD.
 1.11.24.1 08-Feb-2011  bouyer Sync with HEAD
 1.11.22.1 06-Jun-2011  jruoho Sync with HEAD.
 1.11.16.1 05-Mar-2011  rmind sync with head
 1.12.32.1 29-May-2016  skrll Sync with HEAD
 1.12.14.1 03-Dec-2017  jdolecek update from HEAD
 1.13.18.1 10-Jun-2019  christos Sync with HEAD
 1.13.16.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.35 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.34 13-Oct-1996  christos branches: 1.34.4;
backout previous kprintf changes
 1.33 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.32 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.31 11-May-1996  mycroft When sending an ARP request, use the interface address for the route, rather
than the first address assigned. This gives slightly different behaviour in
the presence of aliases. From Bill Fenner, via Pete Bentley.
 1.30 07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.29 30-Mar-1996  christos Fix db_printf formats
 1.28 13-Feb-1996  christos netinet prototypes
 1.27 12-Aug-1995  mycroft splnet --> splsoftnet
 1.26 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.25 04-Jun-1995  mycroft Clean up many more casts.
 1.24 15-May-1995  cgd spacing fixups and KNF. #define ether address size, so it's not
hardcoded as '6' all over.
 1.23 17-Apr-1995  cgd spacing cleaup. also, minor type mixup fixups.
 1.22 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.21 11-Apr-1995  mycroft Remove some explicit references to loif.
 1.20 07-Apr-1995  mycroft Add a common function to initialize ARP-related variables. `Insired'
by Garrett Wollman.
 1.19 06-Mar-1995  glass remove references to arptnew. fix spelling error
 1.18 27-Jul-1994  mycroft Fix byte-order bug in printf() statement.
 1.17 24-Jul-1994  cgd kill conflicting externs
 1.16 29-Jun-1994  cgd branches: 1.16.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.15 21-Jun-1994  chopps config.new hack for if_ether.c for lack of an `and' in the grammer
and protect some ether specific code in in.c
 1.14 04-Jun-1994  gwr Back out some of my changes which Keith Sklower convinced me are
unnecessary. Leaving in just the essentials of the fix.
 1.13 03-Jun-1994  gwr Avoid accidentaly creating permanent entries at time==0
Routes created with RTM_ADD (i.e. manually added) are
permanent so leave their expiration time set to zero.
 1.12 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.11 29-Apr-1994  cgd change timeout/untimeout/wakeup/sleep/tsleep args to void *
 1.10 18-Apr-1994  mycroft Dummy arpintr() for now.
 1.9 18-Apr-1994  glass revised nfs diskless support. uses bootp+rpc to gather parameters
 1.8 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.7 23-Jan-1994  deraadt ether_output() & ether_input() take ether_type as a net-short.
AF_UNSPEC does not swap byte order of ether_type.
NOTE: this requires driver changes
 1.6 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.5 18-Dec-1993  mycroft Canonicalize all #includes.
 1.4 12-Dec-1993  hpeyerl >From cmaeda@cs.washington.edu; part of the multicast patches derived
from the Multicast patches for BSDI.

Thanx to Brad Parker for making me realize i'd forgotten to commit
this patch..(color me dopey)
 1.3 27-Jun-1993  andrew branches: 1.3.4;
ANSIfications - removed all implicit function return types and argument
definitions. Ensured that all files include "systm.h" to gain access to
general prototypes. Casts where necessary.
 1.2 20-May-1993  cgd more rcsid additions and file header cleanups
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.16.2.2 28-Jul-1994  cgd from trunk.
 1.16.2.1 24-Jul-1994  cgd from trunk.
 1.34.4.11 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.34.4.10 06-Mar-1997  is Use <net/ethertypes.h> instead of including all the Ethernet declarations.
 1.34.4.9 19-Feb-1997  is In arpresolve(), only copy up to ifp->if_data.ifi_addrlen bytes to the edst[]
array.
Earlier, excessively long address ARP entries would overwrite part of the
stack after edst[]s end.
 1.34.4.8 18-Feb-1997  is Don't use data gotten over the network as the length for yet another bcmp.
We don't want to be vulnerable to crashes due to bad guys sending us strange
packets.
 1.34.4.7 18-Feb-1997  is Add sanity check for ah->ar_hln in in_arpinput(). We don't want to crash
when people send us bogus ARP packets. At the moment, works only for
link level protocols with constant address length.
 1.34.4.6 18-Feb-1997  is Having converted everything, remove the struct ether_arp definition completely.
Some small cleanup.
STILLTODO: some sanity checks of the (now) variable link level address length
in incoming packets..
 1.34.4.5 17-Feb-1997  is Converted revarprequest() to the new world order.
 1.34.4.4 17-Feb-1997  is Changed in_revarpinput() to new ARP.
 1.34.4.3 13-Feb-1997  is Convert the arp receive/reply code to the new world order.
 1.34.4.2 12-Feb-1997  is Changed arprequest() to use AF_ARP sockaddr and NOT build its own Ethernet
header. Added some missing pieces in ether_output() to support this.
 1.34.4.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.37 03-Feb-2021  roy Guard CTASSERT
 1.36 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.35 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.34 25-Dec-2007  perry branches: 1.34.110;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.33 25-Sep-2006  sketch branches: 1.33.24; 1.33.30; 1.33.34; 1.33.38;
typo.
 1.32 10-Dec-2005  elad branches: 1.32.20; 1.32.22;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.31 07-Aug-2003  agc branches: 1.31.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.30 20-Nov-1999  thorpej branches: 1.30.28;
Add the `packed' attribute to structures which describe wire protocol data.
 1.29 10-Feb-1998  perry branches: 1.29.14; 1.29.20;
add/cleanup multiple inclusion protection.
 1.28 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.27 29-Jul-1997  is Include file in place of the old netinet/if_ether.h, including stuff from
where it is now, and adding the specialized for Ethernet version of the ARP
structure, for the benefit of programs which are externally (to us) maintained
and not (yet) ported.
XXX This should NOT be used inside the kernel.
 1.26 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.25 17-Jan-1997  mikel branches: 1.25.2;
fix my typo; found by Klaus Klein <kleink@layla.inka.de>
 1.24 17-Jan-1997  mikel add prototypes for ethers(3) functions; fixes PR 2471.
fix suggested by Jason Thorpe.
 1.23 09-Oct-1996  thorpej Merge netbsd-1-2 branch back into mainline.
 1.22 11-May-1996  mycroft branches: 1.22.4;
When sending an ARP request, use the interface address for the route, rather
than the first address assigned. This gives slightly different behaviour in
the presence of aliases. From Bill Fenner, via Pete Bentley.
 1.21 13-Feb-1996  christos netinet prototypes
 1.20 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.19 16-May-1995  cgd parenthesize macro arg usage
 1.18 15-May-1995  cgd spacing fixups and KNF. #define ether address size, so it's not
hardcoded as '6' all over.
 1.17 17-Apr-1995  cgd spacing cleaup. also, minor type mixup fixups.
 1.16 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.15 10-Apr-1995  mycroft Remove now unneeded #ifdef. Prototype new function.
 1.14 29-Mar-1995  briggs KERNEL -> _KERNEL
 1.13 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.12 06-Mar-1995  glass remove references to arptnew. fix spelling error
 1.11 27-Feb-1995  glass fix some typos. from frank@fwi.uva.nl (Frank van der Linden)
 1.10 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.7 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.6 30-Dec-1993  deraadt "struct ether_addr" for ethers(3) functions.
 1.5 13-Dec-1993  hpeyerl >From cmaeda@cs.washington.edu; part of the multicast patches derived
from the Multicast patches for BSDI.

(I am a "big dopey bear" for having forgotten this. Thanx Havard.)
 1.4 05-Sep-1993  cassidy Add definition for reverse address resolution protocol.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.22.4.1 17-Jun-1996  gwr Pad the struct arpcom to avoid unnecessary misalignments on m68k ports.
 1.25.2.4 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.25.2.3 06-Mar-1997  is Wipe out double function prototypes.
 1.25.2.2 18-Feb-1997  is Having converted everything, remove the struct ether_arp definition completely.
Some small cleanup.
STILLTODO: some sanity checks of the (now) variable link level address length
in incoming packets..
 1.25.2.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.29.20.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.29.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.30.28.4 11-Dec-2005  christos Sync with head.
 1.30.28.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.30.28.2 18-Sep-2004  skrll Sync with HEAD.
 1.30.28.1 03-Aug-2004  skrll Sync with HEAD
 1.31.16.3 21-Jan-2008  yamt sync with head
 1.31.16.2 30-Dec-2006  yamt sync with head.
 1.31.16.1 21-Jun-2006  yamt sync with head.
 1.32.22.1 22-Oct-2006  yamt sync with head
 1.32.20.1 18-Nov-2006  ad Sync with head.
 1.33.38.1 02-Jan-2008  bouyer Sync with HEAD
 1.33.34.1 26-Dec-2007  ad Sync with head.
 1.33.30.1 18-Feb-2008  mjf Sync with HEAD.
 1.33.24.1 09-Jan-2008  matt sync with HEAD
 1.34.110.1 03-Apr-2021  thorpej Sync with HEAD.
 1.9 24-Jun-2002  itojun integrate IEEE1394 ARP into generic ARP logic.
XXX there's no check at all in ar_hrd, and we don't set ar_hrd on outgoing.
it seems like a bad thing.
 1.8 09-Jun-2002  itojun whitespace
 1.7 15-Nov-2001  lukem branches: 1.7.8;
don't need <sys/types.h> when including <sys/param.h>
 1.6 13-Nov-2001  lukem add RCSIDs
 1.5 04-Jul-2001  itojun branches: 1.5.2;
better support for multiple IPv4 addresses on a single interface.
- consider non-primary (2nd and beyond) IPv4 address as "local", and prevent
outgoing ARP.
- for routing entries generated by ARP, make sure to set rt->rt_ifa equal to
rt_key, to help IPv4 source address selection for traffic to myself.
PR 13311.

caveats/TODOs:
- interface routes ("connected routes" in cisco terminlogy) is tied with the
primary (1st) IPv4 address on the interface. should be fixed with updates
to rt_ifinit().
- source address selection for offlink locations. 1st address tend to be used
with the current code
(you can configure it right by setting rt->rt_ifa accordingly).
 1.4 12-Jun-2001  wiz receive, not recieve
 1.3 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.2 17-Jan-2001  itojun branches: 1.2.2;
pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.1 05-Nov-2000  onoe branches: 1.1.2;
First Prototype implementation of network interface part for IEEE1394 (if_fw).

Current status:
Only OHCI chip is supported (fwohci).
ping (IPv4) works with Sony's implementation (SmartConnect) on Win98.
sometimes works but not stable.
Not implemented yet:
IRM (Isochronous Resource Manager) functionality.
Link layer fragmentation.
Topology map.
More to do:
clean ups
MCAP
charactor device part
dhcp

There is no entry in GENERIC config file yet.
Follow sys/dev/ieee1394/IMPLEMENTATION to enable if_fw.
 1.1.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.1.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.1.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.1.2.1 05-Nov-2000  bouyer file if_ieee1394arp.c was added on branch thorpej_scsipi on 2000-11-22 16:06:07 +0000
 1.2.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.2.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.2.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.2.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.2.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.2.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.5.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.5.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.5.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.7.8.2 15-Jul-2002  gehenna catch up with -current.
 1.7.8.1 20-Jun-2002  gehenna catch up with -current.
 1.3 24-Jun-2002  itojun integrate IEEE1394 ARP into generic ARP logic.
XXX there's no check at all in ar_hrd, and we don't set ar_hrd on outgoing.
it seems like a bad thing.
 1.2 17-Jan-2001  itojun branches: 1.2.2; 1.2.4; 1.2.16;
pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.1 05-Nov-2000  onoe branches: 1.1.2;
First Prototype implementation of network interface part for IEEE1394 (if_fw).

Current status:
Only OHCI chip is supported (fwohci).
ping (IPv4) works with Sony's implementation (SmartConnect) on Win98.
sometimes works but not stable.
Not implemented yet:
IRM (Isochronous Resource Manager) functionality.
Link layer fragmentation.
Topology map.
More to do:
clean ups
MCAP
charactor device part
dhcp

There is no entry in GENERIC config file yet.
Follow sys/dev/ieee1394/IMPLEMENTATION to enable if_fw.
 1.1.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.1.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.1.2.1 05-Nov-2000  bouyer file if_ieee1394arp.h was added on branch thorpej_scsipi on 2000-11-22 16:06:07 +0000
 1.2.16.1 15-Jul-2002  gehenna catch up with -current.
 1.2.4.1 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.2.2.1 01-Aug-2002  nathanw Catch up to -current.
 1.53 03-Sep-2022  thorpej Convert ARP from a legacy netisr to pktqueue.
 1.52 11-Sep-2020  roy ARP: Use ND rather than our own.

This brings the benefit of Neighbour Unreachability Detection which is
something ARP sorely lacks.

The new timings mirror those of IPv6 and are adjustable via sysctl(8).
Unlike IPv6 ND, these are global and not per interface.
 1.51 21-Feb-2017  ozaki-r Replace malloc for DAD with kmem and move them out of the lock for DAD
 1.50 11-Oct-2016  roy branches: 1.50.2;
Mark arprequest static and introduce arpannounce so that gratuitous
ARP requests are only send from valid addresses.
 1.49 19-Apr-2016  ozaki-r branches: 1.49.2;
Constify rtentry of arpresolve

We don't need to (rather shouldn't) modify rtentry in there.
 1.48 07-Apr-2016  christos - tidy up error messages
- add a length argument to arpresolve()
- add KASSERT for overflow
 1.47 21-May-2015  ozaki-r Make arp_init, in_revarpinput and revarprequest static
 1.46 16-May-2015  roy Separate ARP handling DAD from inet.
This is done by signalling the intent to try tentative addresses
and then clearing the intent once the address is setup.
When the ARP handler is installed (arp_ifinit) then it adds
dad start and stop functions to the address which are used instead
of calling ARP directly.
 1.45 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.44 30-Sep-2012  dholland branches: 1.44.14;
Requires <sys/queue.h> for LIST_ENTRY and netinet/in.h for struct in_addr.
 1.43 11-Nov-2011  gdt branches: 1.43.10;
Move RTF_ANNOUNCE flag so that it no longer conflicts with RTF_PROTO2.

RTF_ANNOUNCE was defined as RTF_PROTO2. The flag is used to indicated
that host should act as a proxy for a link level arp or ndp request.
(If RTF_PROTO2 is used as an experimental flag (as advertised),
various problems can occur.)

This commit provides a first-class definition with its own bit for
RTF_ANNOUNCE, removes the old aliasing definitions, and adds support
for the new RTF_ANNOUNCE flag to netstat(8) and route(8).,

Also, remove unused RTF_ flags that collide with RTF_PROTO1:
netinet/icmp6.h defined RTF_PROBEMTU as RTF_PROTO1
netinet/if_inarp.h defined RTF_USETRAILERS as RTF_PROTO1
(Neither of these flags are used anywhere. Both have been removed
to reduce chances of collision with RTF_PROTO1.)

Figuring this out and the diff are the work of Beverly Schwartz of
BBN.

(Passed release build, boot in VM, with no apparently related atf
failures.)

Approved for Public Release, Distribution Unlimited
This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073.
 1.42 18-Feb-2009  yamt branches: 1.42.12;
remove unused #define.
 1.41 24-Oct-2008  dyoung branches: 1.41.2; 1.41.8;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.40 15-Apr-2008  thorpej branches: 1.40.4; 1.40.10;
Make ARP stats per-cpu.
 1.39 04-Mar-2007  christos branches: 1.39.36;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.38 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.37 18-May-2006  liamjfoy branches: 1.37.14;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.36 10-Dec-2005  elad branches: 1.36.4; 1.36.6; 1.36.8; 1.36.12;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.35 21-Apr-2004  itojun branches: 1.35.12;
no space between function name and paren: foo (blah) -> foo(blah)
 1.34 18-Apr-2004  matt De __P()
 1.33 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.32 17-Jan-2001  itojun branches: 1.32.24;
pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.31 20-May-2000  jhawk Install "show arptab" (db_show_arptab) in the ddb command tree.
Move prototype from netinet/if_inarp.h to ddb/db_interface.h.
Change function to have standard ddb parameters (though they're
ignored).
 1.30 30-Mar-2000  simonb Extern decl of arpintrq.
 1.29 05-Aug-1999  sommerfeld branches: 1.29.2;
Implement arp_drain(), which frees packets tied up in the arp cache if
mbufs are in short supply.
Create a (trivial) protocol domain for arp so that the drain routine will
be called from m_reclaim()
 1.28 21-Feb-1999  drochner -always do an RARP if revarpwhoarewe() is called, it might be for another
interface or the server's configuration has changed
-g/c revarpwhoami()
 1.27 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.26 15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.25 17-Jan-1997  mikel branches: 1.25.2;
fix my typo; found by Klaus Klein <kleink@layla.inka.de>
 1.24 17-Jan-1997  mikel add prototypes for ethers(3) functions; fixes PR 2471.
fix suggested by Jason Thorpe.
 1.23 09-Oct-1996  thorpej Merge netbsd-1-2 branch back into mainline.
 1.22 11-May-1996  mycroft branches: 1.22.4;
When sending an ARP request, use the interface address for the route, rather
than the first address assigned. This gives slightly different behaviour in
the presence of aliases. From Bill Fenner, via Pete Bentley.
 1.21 13-Feb-1996  christos netinet prototypes
 1.20 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.19 16-May-1995  cgd parenthesize macro arg usage
 1.18 15-May-1995  cgd spacing fixups and KNF. #define ether address size, so it's not
hardcoded as '6' all over.
 1.17 17-Apr-1995  cgd spacing cleaup. also, minor type mixup fixups.
 1.16 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.15 10-Apr-1995  mycroft Remove now unneeded #ifdef. Prototype new function.
 1.14 29-Mar-1995  briggs KERNEL -> _KERNEL
 1.13 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.12 06-Mar-1995  glass remove references to arptnew. fix spelling error
 1.11 27-Feb-1995  glass fix some typos. from frank@fwi.uva.nl (Frank van der Linden)
 1.10 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.7 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.6 30-Dec-1993  deraadt "struct ether_addr" for ethers(3) functions.
 1.5 13-Dec-1993  hpeyerl >From cmaeda@cs.washington.edu; part of the multicast patches derived
from the Multicast patches for BSDI.

(I am a "big dopey bear" for having forgotten this. Thanx Havard.)
 1.4 05-Sep-1993  cassidy Add definition for reverse address resolution protocol.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.22.4.1 17-Jun-1996  gwr Pad the struct arpcom to avoid unnecessary misalignments on m68k ports.
 1.25.2.3 06-Mar-1997  is Wipe out double function prototypes.
 1.25.2.2 18-Feb-1997  is Having converted everything, remove the struct ether_arp definition completely.
Some small cleanup.
STILLTODO: some sanity checks of the (now) variable link level address length
in incoming packets..
 1.25.2.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.29.2.2 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.29.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.32.24.4 11-Dec-2005  christos Sync with head.
 1.32.24.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.32.24.2 18-Sep-2004  skrll Sync with HEAD.
 1.32.24.1 03-Aug-2004  skrll Sync with HEAD
 1.35.12.3 03-Sep-2007  yamt sync with head.
 1.35.12.2 26-Feb-2007  yamt sync with head.
 1.35.12.1 21-Jun-2006  yamt sync with head.
 1.36.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.36.8.1 24-May-2006  yamt sync with head.
 1.36.6.1 01-Jun-2006  kardel Sync with head.
 1.36.4.1 09-Sep-2006  rpaulo sync with head
 1.37.14.2 12-Mar-2007  rmind Sync with HEAD.
 1.37.14.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.39.36.2 17-Jan-2009  mjf Sync with HEAD.
 1.39.36.1 02-Jun-2008  mjf Sync with HEAD.
 1.40.10.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.40.4.1 04-May-2009  yamt sync with head.
 1.41.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.41.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.42.12.2 30-Oct-2012  yamt sync with head
 1.42.12.1 17-Apr-2012  yamt sync with head
 1.43.10.2 03-Dec-2017  jdolecek update from HEAD
 1.43.10.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.44.14.4 28-Aug-2017  skrll Sync with HEAD
 1.44.14.3 05-Dec-2016  skrll Sync with HEAD
 1.44.14.2 22-Apr-2016  skrll Sync with HEAD
 1.44.14.1 06-Jun-2015  skrll Sync with HEAD
 1.49.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.49.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.50.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.70 15-May-2020  maxv igmp_sendpkt() expects ip_output() to set 'imo.imo_multicast_ttl' into
'ip->ip_ttl'; but ip_output() won't if the target is not a multicast
address, meaning that the uninitialized 'ip->ip_ttl' byte gets sent to
the network. This leaks one byte of kernel heap.

Fix this by filling 'ip->ip_ttl' with a TTL of one.

Found by KMSAN.

Reported-by: syzbot+e49f7b8a8fec5a477c9a@syzkaller.appspotmail.com
 1.69 14-Sep-2018  maxv branches: 1.69.4;
Use non-variadic function pointer in protosw::pr_input.
 1.68 21-Jun-2018  knakahara branches: 1.68.2;
sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.
However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.67 10-Apr-2018  maxv Replace comment by KASSERT.
 1.66 07-Feb-2018  maxv branches: 1.66.2;
Remove RSVP_ISI, that's mostly dead code. FreeBSD and OpenBSD too removed
it; FreeBSD kept some pieces but they are mostly no-opts.

Sent on tech-net@, no comment.
 1.65 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.64 24-Jan-2017  ozaki-r branches: 1.64.6;
Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.63 11-Jan-2017  ozaki-r branches: 1.63.2;
Get rid of unnecessary header inclusions
 1.62 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.61 08-Jul-2016  ozaki-r branches: 1.61.2;
Replace macros to get an IP address with proper inline functions

The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
 1.60 28-Jun-2016  ozaki-r Add missing NULL checks for m_get_rcvif_psref
 1.59 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.58 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.57 26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.56 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.55 29-May-2014  rmind branches: 1.55.4;
Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.
 1.54 25-Feb-2014  pooka branches: 1.54.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.53 09-Jan-2012  liamjfoy branches: 1.53.6; 1.53.10;
check against NULL
 1.52 17-Jul-2011  joerg branches: 1.52.2; 1.52.6;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.51 16-Sep-2009  pooka Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.50 13-Sep-2009  pooka Wipe out the last vestiges of POOL_INIT with one swift stroke. In
most cases, use a proper constructor. For proplib, give a local
equivalent of POOL_INIT for the kernel object implementation. This
way the code structure can be preserved, and a local link set is
not hazardous anyway (unless proplib is split to several modules,
but that'll be the day).

tested by booting a kernel in qemu and compile-testing i386/ALL
 1.49 04-May-2008  thorpej Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.48 24-Apr-2008  ad branches: 1.48.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.47 23-Apr-2008  thorpej Use <net/net_stats.h> / netstat_sysctl().
 1.46 15-Apr-2008  thorpej branches: 1.46.2;
Make IGMP stats per-cpu.
 1.45 25-Apr-2007  dyoung branches: 1.45.28;
Get rid of some gratuitous casts and join some lines.
 1.44 12-Mar-2007  ad branches: 1.44.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.43 05-Oct-2006  tls branches: 1.43.4; 1.43.8;
Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.42 11-Dec-2005  christos branches: 1.42.20; 1.42.22;
merge ktrace-lwp.
 1.41 03-Feb-2005  perry branches: 1.41.6;
ANSIfy function prototypes. (Still have about 3/5ths of the C files in
netinet to go...)
 1.40 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.39 13-Nov-2004  christos branches: 1.39.4; 1.39.6;
PR/25749: Peter Postma: missing splx() in kernel.
 1.38 26-Apr-2004  matt Remove #else clause of __STDC__
 1.37 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.36 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.35 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.34 26-Jun-2003  itojun branches: 1.34.2;
purge rti structure (in igmp.c) for removed ifp on if_detach().
 1.33 15-Jun-2003  matt Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.32 07-Nov-2002  thorpej Fix signed/unsigned comparison warnings.
 1.31 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.30 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.29 09-Jun-2002  itojun whitespace
 1.28 12-May-2002  matt branches: 1.28.2; 1.28.4;
Eliminate commons.
 1.27 13-Nov-2001  lukem add RCSIDs
 1.26 25-Jul-2001  enami Remove an obsolete comment.
 1.25 16-Jun-2000  matt branches: 1.25.2; 1.25.4; 1.25.6;
Don't copy M_EXT mbufs unless in "dhcp" mode. Do a mtod after the pullup
to make sure the ip pointer is still valid.
 1.24 30-Mar-2000  augustss branches: 1.24.2;
Remove register declarations.
 1.23 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.22 09-Jul-1999  thorpej branches: 1.22.2;
defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.21 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.20 25-Apr-1999  hwr If the incoming code value is 0, timer gets 0, which would mean
a divide by zero afterwards.
This is also, what Bill Fenner seems to have done in the FreeBSD igmp
code.
This should fix kern/6541.
 1.19 19-Jan-1999  mycroft branches: 1.19.2;
Don't screw with ip_len; just subtract from it where we actually use the
value.
 1.18 13-Feb-1998  tls branches: 1.18.6;
Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.
 1.17 12-Jan-1998  scottr Use option header file for MROUTING
 1.16 09-Sep-1996  mycroft branches: 1.16.14;
Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.15 13-Feb-1996  christos netinet prototypes
 1.14 12-Aug-1995  mycroft splnet --> splsoftnet
 1.13 04-Jun-1995  mycroft Don't cast things unnecessarily.
 1.12 01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.11 01-Jun-1995  mycroft Add missing ntohl() in multicast test.
 1.10 31-May-1995  mycroft Implement IGMP v2. Based on the Multicast 3.5 distribution.
 1.9 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.8 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 09-Jun-1994  brezak Patch to fix ip cksum errors. From mccanne@ee.lbl.gov (Steven McCanne).
 1.6 04-Jun-1994  mycroft Modify the loopback checks to deal with multiple interfaces.
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.3 10-Jan-1994  mycroft Should compile now with or without `options MULTICAST'.
 1.2 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.1 06-Dec-1993  hpeyerl branches: 1.1.1;
multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.1 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.16.14.1 01-Oct-1998  cgd pull up revision 1.18 from trunk. (tls)
 1.18.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.19.2.1 26-Apr-1999  perry branches: 1.19.2.1.2; 1.19.2.1.4;
pullup 1.19->1.20 (Heiko Rupp): fix handling of packets with 0 timer values.
 1.19.2.1.4.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.19.2.1.4.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.19.2.1.2.3 02-Aug-1999  thorpej Update from trunk.
 1.19.2.1.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.19.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.22.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.24.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.25.6.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.25.6.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.25.6.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.25.6.1 03-Aug-2001  lukem update to -current
 1.25.4.6 11-Nov-2002  nathanw Catch up to -current
 1.25.4.5 27-Aug-2002  nathanw Catch up to -current.
 1.25.4.4 01-Aug-2002  nathanw Catch up to -current.
 1.25.4.3 20-Jun-2002  nathanw Catch up to -current.
 1.25.4.2 14-Nov-2001  nathanw Catch up to -current.
 1.25.4.1 24-Aug-2001  nathanw Catch up with -current.
 1.25.2.1 04-Aug-2003  msaitoh Pull up revision 1.34 (requested by itojun in ticket #50):
purge rti structure (in igmp.c) for removed ifp on if_detach().
 1.28.4.2 02-Jul-2003  tron Apply patch (requested by itojun in ticket #1360):
Make it compilable.
 1.28.4.1 30-Jun-2003  grant Pull up revision 1.34 (requested by itojun in ticket #1341):

purge rti structure (in igmp.c) for removed ifp on if_detach().
 1.28.2.3 29-Aug-2002  gehenna catch up with -current.
 1.28.2.2 15-Jul-2002  gehenna catch up with -current.
 1.28.2.1 20-Jun-2002  gehenna catch up with -current.
 1.34.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.34.2.4 29-Nov-2004  skrll Sync with HEAD.
 1.34.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.34.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.34.2.1 03-Aug-2004  skrll Sync with HEAD
 1.39.6.1 12-Feb-2005  yamt sync with head.
 1.39.4.1 29-Apr-2005  kent sync with -current
 1.41.6.2 03-Sep-2007  yamt sync with head.
 1.41.6.1 30-Dec-2006  yamt sync with head.
 1.42.22.1 22-Oct-2006  yamt sync with head
 1.42.20.1 18-Nov-2006  ad Sync with head.
 1.43.8.2 08-Jun-2007  ad Sync with head.
 1.43.8.1 13-Mar-2007  ad Sync with head.
 1.43.4.2 07-May-2007  yamt sync with head.
 1.43.4.1 24-Mar-2007  yamt sync with head.
 1.44.2.1 11-Jul-2007  mjf Sync with head.
 1.45.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.46.2.1 18-May-2008  yamt sync with head.
 1.48.2.3 11-Mar-2010  yamt sync with head
 1.48.2.2 16-Sep-2009  yamt sync with head
 1.48.2.1 16-May-2008  yamt sync with head.
 1.52.6.1 18-Feb-2012  mrg merge to -current.
 1.52.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.52.2.1 17-Apr-2012  yamt sync with head
 1.53.10.1 18-May-2014  rmind sync with head
 1.53.6.2 03-Dec-2017  jdolecek update from HEAD
 1.53.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.54.2.1 10-Aug-2014  tls Rebase.
 1.55.4.5 05-Feb-2017  skrll Sync with HEAD
 1.55.4.4 05-Oct-2016  skrll Sync with HEAD
 1.55.4.3 09-Jul-2016  skrll Sync with HEAD
 1.55.4.2 29-May-2016  skrll Sync with HEAD
 1.55.4.1 22-Sep-2015  skrll Sync with HEAD
 1.61.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.61.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.63.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.64.6.3 25-May-2020  martin Pull up following revision(s) (requested by christos in ticket #1549):

sys/netinet/igmp.c: revision 1.70
sys/kern/kern_time.c: revision 1.204

igmp_sendpkt() expects ip_output() to set 'imo.imo_multicast_ttl' into
'ip->ip_ttl'; but ip_output() won't if the target is not a multicast
address, meaning that the uninitialized 'ip->ip_ttl' byte gets sent to
the network. This leaks one byte of kernel heap.

Fix this by filling 'ip->ip_ttl' with a TTL of one.
Found by KMSAN.

-

Fix uninitialized memory access. Found by KMSAN.
 1.64.6.2 13-Jul-2018  martin Pull up following revision(s) via patch (requested by knakahara in ticket #905):

sys/netinet/ip_mroute.c: revision 1.160
sys/netinet6/in6_l2tp.c: revision 1.16
sys/net/if.h: revision 1.263
sys/netinet/in_l2tp.c: revision 1.15
sys/netinet/ip_icmp.c: revision 1.172
sys/netinet/igmp.c: revision 1.68
sys/netinet/ip_encap.c: revision 1.69
sys/netinet6/ip6_mroute.c: revision 1.129

sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.

However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.64.6.1 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.66.2.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.66.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.66.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.68.2.1 10-Jun-2019  christos Sync with HEAD
 1.69.4.1 18-May-2020  martin Pull up following revision(s) (requested by maxv in ticket #915):

sys/netinet/igmp.c: revision 1.70

igmp_sendpkt() expects ip_output() to set 'imo.imo_multicast_ttl' into
'ip->ip_ttl'; but ip_output() won't if the target is not a multicast
address, meaning that the uninitialized 'ip->ip_ttl' byte gets sent to
the network. This leaks one byte of kernel heap.

Fix this by filling 'ip->ip_ttl' with a TTL of one.

Found by KMSAN.
 1.15 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.14 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.13 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.12 29-May-2014  rmind branches: 1.12.40;
Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.
 1.11 25-Dec-2007  perry branches: 1.11.54; 1.11.70;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.10 10-Dec-2005  elad branches: 1.10.46; 1.10.52; 1.10.56; 1.10.60;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.9 07-Aug-2003  agc branches: 1.9.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.8 20-Nov-1999  thorpej branches: 1.8.28;
Add the `packed' attribute to structures which describe wire protocol data.
 1.7 10-Feb-1998  perry branches: 1.7.14; 1.7.20;
add/cleanup multiple inclusion protection.
 1.6 31-May-1995  mycroft Implement IGMP v2. Based on the Multicast 3.5 distribution.
 1.5 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.4 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.3 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.2 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.1 06-Dec-1993  hpeyerl branches: 1.1.1;
multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.1.1.1 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.7.20.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.7.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.28.4 11-Dec-2005  christos Sync with head.
 1.8.28.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.8.28.2 18-Sep-2004  skrll Sync with HEAD.
 1.8.28.1 03-Aug-2004  skrll Sync with HEAD
 1.9.16.2 21-Jan-2008  yamt sync with head
 1.9.16.1 21-Jun-2006  yamt sync with head.
 1.10.60.1 02-Jan-2008  bouyer Sync with HEAD
 1.10.56.1 26-Dec-2007  ad Sync with head.
 1.10.52.1 18-Feb-2008  mjf Sync with HEAD.
 1.10.46.1 09-Jan-2008  matt sync with HEAD
 1.11.70.1 10-Aug-2014  tls Rebase.
 1.11.54.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.12.40.1 03-Apr-2021  thorpej Sync with HEAD.
 1.27 17-Feb-2021  christos - pass the alignment instead of the mask (as Roy asked and to match the
other macro)
- use alignof to determine that alignment and CTASSERT what we expect
- remove unused macros
 1.26 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.25 14-Sep-2018  maxv branches: 1.25.12;
Use non-variadic function pointer in protosw::pr_input.
 1.24 29-May-2014  rmind branches: 1.24.26; 1.24.28;
Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.
 1.23 15-Apr-2008  thorpej branches: 1.23.48; 1.23.64;
Make IGMP stats per-cpu.
 1.22 10-Dec-2005  elad branches: 1.22.70;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.21 25-Apr-2004  simonb branches: 1.21.12;
Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.20 21-Apr-2004  itojun no space between function name and paren: foo (blah) -> foo(blah)
 1.19 18-Apr-2004  matt De __P()
 1.18 07-Oct-2003  mycroft There is also no reason to use arc4random() here.
 1.17 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.16 26-Jun-2003  itojun branches: 1.16.2;
purge rti structure (in igmp.c) for removed ifp on if_detach().
 1.15 15-Jun-2003  matt Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.14 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.13 29-May-2002  itojun use arc4random
 1.12 12-May-2002  matt branches: 1.12.2; 1.12.4;
Eliminate commons.
 1.11 19-Nov-1999  bouyer branches: 1.11.4; 1.11.6; 1.11.8;
Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.10 10-Feb-1998  perry branches: 1.10.14; 1.10.20;
add/cleanup multiple inclusion protection.
 1.9 13-Feb-1996  christos netinet prototypes
 1.8 31-May-1995  mycroft Implement IGMP v2. Based on the Multicast 3.5 distribution.
 1.7 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 10-Jan-1994  mycroft Change the counters to be all the same type -- u_long.
 1.3 09-Jan-1994  mycroft Prototype the rest.
 1.2 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.1 06-Dec-1993  hpeyerl branches: 1.1.1;
multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.1.1.1 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.10.20.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.10.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.8.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.11.8.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.11.6.2 01-Aug-2002  nathanw Catch up to -current.
 1.11.6.1 20-Jun-2002  nathanw Catch up to -current.
 1.11.4.1 04-Aug-2003  msaitoh Pullup revision 1.16 (requested by itojun in ticket #50):
purge rti structure (in igmp.c) for removed ifp on if_detach().
 1.12.4.1 30-Jun-2003  grant Pull up revision 1.16 (requested by itojun in ticket #1341):

purge rti structure (in igmp.c) for removed ifp on if_detach().
 1.12.2.2 15-Jul-2002  gehenna catch up with -current.
 1.12.2.1 30-May-2002  gehenna Catch up with -current.
 1.16.2.4 11-Dec-2005  christos Sync with head.
 1.16.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.16.2.1 03-Aug-2004  skrll Sync with HEAD
 1.21.12.1 21-Jun-2006  yamt sync with head.
 1.22.70.1 02-Jun-2008  mjf Sync with HEAD.
 1.23.64.1 10-Aug-2014  tls Rebase.
 1.23.48.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.24.28.1 10-Jun-2019  christos Sync with HEAD
 1.24.26.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.25.12.1 03-Apr-2021  thorpej Sync with HEAD.
 1.248 20-Aug-2024  ozaki-r inet: send GARP on link up if DAD is disabled

This behavior was accidentally removed at rev 1.233.
 1.247 25-Nov-2022  knakahara branches: 1.247.2; 1.247.8;
Support explicit unnumbered interface.

Currently, NetBSD supports implicit unnumbered interface by setting
the same IP address to two interfaces. However, such interface is not
treated as unnumbered when one of the interfaces is being changed and
has been changed IP address. That behavior can be harmful for some
routing daemons.
 1.246 19-Nov-2022  yamt Make arp have its own mowner

This helped me to debug mbuf leaks in arp.
(if_arp.c rev. 1.298)
 1.245 17-Nov-2022  knakahara Fix sending broken RTM_DELADDR message in some operations.

Here is mininum reproduction operation.
 1.244 04-Nov-2022  ozaki-r inpcb: use in_port_t for port numbers
 1.243 20-Sep-2022  knakahara Remove routes on an address removal if the routes referencing to the address. Implemented by ozaki-r@n.o.

A route that has a gateway is on a connected route can be invalid if the
connected route is deleted, i.e., an associated address is removed.
Traditionally NetBSD doesn't sweep such a route on the address removal. Sending
packets over the route fails with "No route to host". Also the route holds an
orphan ifaddr as rt_ifa that is destructed say by in_purgeaddr.

If the same address is assgined again in such a state, there can be two
different ifaddr objects with the same address. Until recently it's not a
big problem because we can send packets anyway. However after MP-ification
of the network stack, we can't send packets because we strictly check if rt_ifa
(i.e., the (old) ifaddr) is valid.

This change automatically removes such routes on a removal of an associated
address to avoid keeping inconsistent routes.
 1.242 21-Sep-2021  christos don't opencode kauth_cred_get()
 1.241 29-Sep-2020  roy inet: Treat LINK_STATE_UNKNOWN as LINK_STATE_UP when changing

It's something we have always done.
it's really rare for anything to transition to UNKNOWN from either
UP or DOWN, but technically it is possible.
 1.240 11-Sep-2020  roy inet: Add SIOCGNBRINFO to retrieve neighbor state about an address
 1.239 11-Sep-2020  roy in: No need to set expire here anymore
 1.238 29-Aug-2020  christos Partially revert previous: set RTF_HOST regardless of mask for point-to-point
links. Unbreaks IPSEC/L2TP configurations.
 1.237 20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.236 18-Dec-2019  roy inet: Add support for IPv4 /31 prefixes, as described in RFC 3021.

To run a /31 network, participating hosts MUST drop support
for directed broadcasts, and treat the first and last addresses
on subnet as unicast. The broadcast address for the prefix
should be the link local broadcast address, INADDR_BROADCAST.

Taken from FreeBSD, r226402.
Fixes PR kern/51388.
 1.235 25-Sep-2019  ozaki-r Make panic messages more informative
 1.234 29-Apr-2019  roy branches: 1.234.2;
rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.
 1.233 29-Nov-2018  ozaki-r Don't run DAD on link-up if it's explicitly disabled
 1.232 29-Nov-2018  ozaki-r Introduce and use ip_dad_enabled() and ip6_dad_enabled() functions
 1.231 13-May-2018  khorben branches: 1.231.2;
Fix spello in a comment
 1.230 24-Apr-2018  knakahara Fix sys/netinet/in.c:r1.229 problem. I have missed FALLTHROUGH, sorry.
 1.229 20-Apr-2018  knakahara SIOCSIFDSTADDR uses struct ifreq instead of struct ifaddr or struct in_aliasreq.

SIOCSIFDSTADDR is not used by base package commands...

I checked sys/net*/* only.
 1.228 08-Apr-2018  christos Protect ip_dad_count with if NARP > 0 to fix compilation
 1.227 06-Apr-2018  ozaki-r Make GARP work again when DAD is disabled

The change avoids setting an IP address tentative on initializing it when the
IPv4 DAD is disabled (net.inet.ip.dad_count=0), which allows a GARP packet to be
sent (see arpannounce). This is the same behavior of NetBSD 7, i.e., before
introducing the IPv4 DAD.

Additionally do the same change to IPv6 DAD for consistency.

The change is suggested by roy@
 1.226 06-Apr-2018  ozaki-r Revert the previous two commits as per roy@'s request

It broke the ip_dad_count > 0 case unexpectedly.
 1.225 06-Apr-2018  ozaki-r Don't set IN_IFF_* flags to ia4_flags if DAD is disabled

This fix allows that a GARP packet is sent when adding an IP address to an
interface with IFF_UP on a kernel with IPv4 DAD is disabled
(net.inet.ip.dad_count=0), which is the same behavior of NetBSD 7, i.e.,
before introducing the IPv4 DAD.
 1.224 06-Apr-2018  ozaki-r Simplify; clear then set flags to ia4_flags (NFCI)
 1.223 06-Mar-2018  ozaki-r Use pool(9) for llentry allocations

llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.222 06-Mar-2018  ozaki-r Fix memory leaks on arp -d and ndp -d for static entries

We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.
 1.221 06-Mar-2018  ozaki-r Fix reference leaks of llentry

callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).

While here, we can remove remaining abuses of mutex_owned for softnet_lock.
 1.220 06-Mar-2018  ozaki-r Add assertions

We must not destroy llentries holding mbufs.
 1.219 24-Feb-2018  ozaki-r branches: 1.219.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043
 1.218 14-Feb-2018  maxv Remove IFF_STATICARP, we don't support this, and the code is useless in its
current form.

ok ozaki-r@
 1.217 08-Feb-2018  ozaki-r Don't call lltable_purge_entries from in_if_down if ARP isn't enabled

Reported by bouyer@
 1.216 19-Jan-2018  ozaki-r Suppress noisy debugging outputs

Even if DEBUG they are too noisy under load.
 1.215 15-Jan-2018  ozaki-r Remove extra pserialize_perform from in_purgeaddr

It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too).
 1.214 10-Jan-2018  knakahara add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.213 27-Dec-2017  ozaki-r Don't pass rwlock to callout_halt
 1.212 25-Dec-2017  ozaki-r Fix wrong usage of psref_held

We can't use it for checking if a caller does NOT hold a given target.
If you want to do it you should have psref_not_held or something.
 1.211 15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.210 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.209 10-Nov-2017  ozaki-r Fix a deadlock between a route update and lltable

It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.

If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.

A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update

Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.

Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.208 10-Nov-2017  ozaki-r Remove redundant KASSERTMSG

The function is static, has just one caller and the caller does the same check.
 1.207 10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.206 04-Aug-2017  uwe Fix it's -> its in a comment.
 1.205 22-Jun-2017  ozaki-r Purge ARP/NDP entries on an interface when the interface is down

Fix PR kern/51179
 1.204 22-Jun-2017  ozaki-r Fix in_lltable_match_prefix

The function has not been used but will be used soon.
 1.203 01-Jun-2017  chs branches: 1.203.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.202 25-May-2017  ozaki-r Fix that a fresh in_ifaddr is unexpectedly freed before activating it

An in_ifaddr object is initialized with refcnt=0 and the refcnt
is incremented when being enqueued to the lists. However before
enqueuing it, in_ifinit can hold and refelease a reference to
it, i.e., call ifaref and ifafree, resulting in that the object
is freed in ifafree because its refcnt is decremented to 0.

It can be reproduced by doing:
ifconfig tun0 create
ifconfig tun1 create
ifconfig tun0 10.1 10.2
ifconfig tun1 10.2 10.1
ifconfig # Cause a kernel panic (may depend on environmemts)

We need to initialize a created in_ifaddr object with refcnt=1
to make the object survive over in_ifinit.

The issue is found by ryo@
 1.201 12-May-2017  ryo replace in_fmtaddr() by IN_PRINT(), and delete function in_fmtaddr()
 1.200 28-Apr-2017  ozaki-r Don't output debugging logs just if DIAGNOSTIC

Also make log messages informative.
 1.199 17-Mar-2017  roy branches: 1.199.4;
Add the local route after finishing the configuration of the address.
This fixes the issue where the initial address announced had an
invalid broadcast address.
 1.198 02-Mar-2017  ozaki-r Protect ia_allhosts by in_ifaddr_lock
 1.197 23-Jan-2017  ozaki-r Replace some splnet with splsoftnet
 1.196 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.195 02-Jan-2017  christos branches: 1.195.2;
- You can't just call the pfil hook to remove an address before an address
is removed! Hold a reference instead, remove it, and then free it.
- GC iatoifa()
 1.194 31-Dec-2016  ryo In the case of SIOCDIFADDR, call pfil_run_addrhooks before release ia.
 1.193 27-Dec-2016  ozaki-r Fix panic in pfil_run_hooks on bootup

XXX a kernel with pf still fails to boot up. Please someone fix it.
 1.192 26-Dec-2016  knakahara pserialize_perform() is required an additionally serialization. see pserialize(9).

ok by ozaki-r@n.o.
 1.191 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.190 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.189 06-Dec-2016  knakahara add API to manipulate ifa->ia_hash and ia_hash_pslist_entry, and fix ia_hash_pslist_entry race by using them.

in_ifaddr_lock is required before writing ifa->ia_hash and
ia_hash_pslist_entry to serialize writer processings.

reviewed by ozaki-r@n.o.
 1.188 18-Nov-2016  knakahara We must use PSLIST_ENTRY_DESTROY after PSLIST_WRITER_REMOVE and waiting all readers done.

And then, if we want to re-insert the removed pslist element, we need to
call PSLIST_ENTERY_INIT again.

advised by riastradh@n.o and reviewed by ozaki-r@n.o, thanks.
 1.187 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.186 01-Oct-2016  roy Default netmask to /32 for INET on POINTOPOINT links if not specified.
 1.185 29-Sep-2016  roy in_ifscrub is no longer needed.
 1.184 29-Sep-2016  roy Set dstaddr in in_ifinit so that sppp consumers announce the correct
dstaddr in routing messages.
 1.183 29-Sep-2016  roy When changing an address via in_ifinit, ensure that the old address
is correctly scrubbed.
This allows sppp consumers to announce removal of the old address.
 1.182 16-Sep-2016  roy Drop hostIsNew from in_ifinit, let the function work out if the address
has changed.
Sync address flag setup with the IPv6 counterpart.
When scrubbing the address, or setting up the address fails, restore the
old address flags as well as the old address.
 1.181 13-Sep-2016  christos revert previous, roy says it breaks DaD.
 1.180 13-Sep-2016  christos When initializing addresses, reset the interface flags to 0. This fixes
an issue where point to point addresses that started down, and then came
up, were left with stale flags on one side of the point to point link.
 1.179 01-Sep-2016  ozaki-r Apply psz/psref to remaining IFADDR_READER_FOREACH

Pointed out by ryo@
 1.178 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.177 28-Jul-2016  ozaki-r Fix panic on adding/deleting IP addresses under network load

Adding and deleting IP addresses aren't serialized with other network
opeartions, e.g., forwarding packets. So if we add or delete an IP
address under network load, a kernel panic may happen on manipulating
network-related shared objects such as rtentry and rtcache.

To avoid such panicks, we still need to hold softnet_lock in in_control
and in6_control that are called via ioctl and do network-related operations
including IP address additions/deletions.

Fix PR kern/51356
 1.176 20-Jul-2016  ozaki-r Get rid of unnecessary satosin
 1.175 14-Jul-2016  christos branches: 1.175.2;
provide net.inet.multicast, like we have net.inet6.multicast to be used
by netstat.
 1.174 13-Jul-2016  ozaki-r Get rid of wrongly added TAILQ_INSERT_TAIL
 1.173 08-Jul-2016  ozaki-r Replace macros to get an IP address with proper inline functions

The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
 1.172 07-Jul-2016  ozaki-r Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.171 06-Jul-2016  ozaki-r Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.
 1.170 06-Jul-2016  ozaki-r Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.
 1.169 30-Jun-2016  ozaki-r Make sure that ifaddr is published after its initialization finished

Basically we should insert an item to a collection (say a list) after
item's initialization has been completed to avoid accessing an item
that is initialized halfway. ifaddr (in{,6}_ifaddr) isn't processed
like so and needs to be fixed.

In order to do so, we need to tweak {arp,nd6}_rtrequest that depend
on that an ifaddr is inserted during its initialization; they explore
interface's address list to determine that rt_getkey(rt) of a given
rtentry is in the list to know whether the route's interface should
be a loopback, which doesn't work after the change. To make it work,
first check RTF_LOCAL flag that is set in rt_ifa_addlocal that calls
{arp,nd6}_rtrequest eventually. Note that we still need the original
code for the case to remove and re-add a local interface route.
 1.168 23-Jun-2016  ozaki-r Fix typo in a comment
 1.167 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.166 27-May-2016  christos make hostzerobroadcast default to "no".
 1.165 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.164 25-Feb-2016  ozaki-r Use callout_halt instead of callout_stop
 1.163 26-Nov-2015  ozaki-r Fix build dependency of if_llatbl.c

if_llatbl.c is required if inet or inet6 is enabled. Depending on ether
doesn't suit for NDP case.
 1.162 16-Nov-2015  ozaki-r Add missing rtfree
 1.161 31-Aug-2015  ozaki-r Fix building kernels w/o ether
 1.160 31-Aug-2015  ozaki-r Fix building kernels w/o DIAGNOSTIC
 1.159 31-Aug-2015  ozaki-r Replace ARP cache (llinfo) with lltable/llentry

Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
- ARP specific data are stored in the hashed list
of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
- the global timer callout with the big locks can be
removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
- it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
- it was a parameter that prevents expiration of active caches
- Removed to simplify the timer logic, but we may be able to
restore the feature if really needed

Proposed on tech-kern and tech-net.
 1.158 31-Aug-2015  ozaki-r Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.
 1.157 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.156 16-May-2015  roy Separate ARP handling DAD from inet.
This is done by signalling the intent to try tentative addresses
and then clearing the intent once the address is setup.
When the ARP handler is installed (arp_ifinit) then it adds
dad start and stop functions to the address which are used instead
of calling ARP directly.
 1.155 05-May-2015  roy If we don't have ARP, don't set IN_IFF_TENTATIVE.
 1.154 02-May-2015  joerg Fix !ARP build.
 1.153 02-May-2015  roy Appease gcc.
 1.152 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.151 26-Feb-2015  roy Don't add local routes for the any address or p2p addresses where the address matches the destination.
 1.150 26-Feb-2015  roy Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.
 1.149 01-Dec-2014  christos Don't pass junk in sin_family and sin_len for SIOCGIFNETMASK, and explain why.
XXX: pullup 7?
 1.148 09-Sep-2014  rmind branches: 1.148.2;
Eliminate IFAREF() and IFAFREE() macros in favour of functions.
 1.147 01-Jul-2014  rtr branches: 1.147.2;
fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.146 29-May-2014  rmind Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.
 1.145 22-May-2014  rmind - Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.144 29-Jun-2013  rmind branches: 1.144.4;
- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.143 08-Jun-2012  gdt branches: 1.143.2; 1.143.4;
Simply use the ifa_addr pointer, rather than taking its address.
Resolves failure to match addresses in SIOC[GD]LIFADDR calls.
Diagnosis and fix is due to Mark Keaton of BBN.
 1.142 12-Dec-2011  roy branches: 1.142.2;
When adding or scrubbing a prefix, always notify userland even if the
prefix does not have IFA_ROUTE.
Don't scrub the interface in SIOCAIFADDR if the new address does't
have IFA_ROUTE. If more functions are added to in_ifscrub then this logic
might need to be revisited.

Fixes PR/26450.
 1.141 19-Nov-2011  tls branches: 1.141.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.140 28-Oct-2011  dyoung branches: 1.140.2;
Remove the #if 1 / #endif around some code that appears to be
responsible deleting the 'first' AF_INET address on the interface if the
target address has family == AF_UNSPEC.
 1.139 19-Oct-2011  dyoung Use if_addr_init() and if_mcast_op() instead of ifp->if_ioctl().
 1.138 15-May-2010  oki Backout rev.1.137. It causes troubles, see PR kern/43294.
We needs more discussion/a more general solution.
 1.137 12-Mar-2010  oki branches: 1.137.2;
Fixed a number of race conditions in the case of receiving ipv4 packet.
found by iij seil team.
 1.136 07-Dec-2009  dyoung branches: 1.136.2;
Initialize/compare pointers with NULL instead of 0.
 1.135 11-Sep-2009  dyoung Make ifconfig(8) set and display preference numbers for IPv6
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.

In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.

Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.

In support of the changes above,

1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().

2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.

3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
 1.134 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.133 18-Mar-2009  cegger bcopy -> memcpy
 1.132 18-Mar-2009  cegger bzero -> memset
 1.131 12-Feb-2009  christos PR/40603: Christoph Badura: unprivileged users can add and delete interface
link addresses. Fixed by centralizing the test as suggested. Will pull up
to 5.0 once submitter tests the fix.
 1.130 21-Dec-2008  roy branches: 1.130.2;
The automatic addition of a subnet route should not error if a manually
added route already exists. Fixes PR kern/40133.
 1.129 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.128 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.127 25-Sep-2008  pooka branches: 1.127.2; 1.127.4;
Don't wrap whole file in INET.
 1.126 11-May-2008  dyoung branches: 1.126.4;
Cosmetic: compare sa_family with AF_UNSPEC instead of testing truth.
Join a line. Compare sa_len with 0 instead of testing truth.
 1.125 28-Apr-2008  martin branches: 1.125.2;
Remove clause 3 and 4 from TNF licenses
 1.124 10-Apr-2008  dyoung branches: 1.124.2; 1.124.4;
s/8/NBBY/
 1.123 06-Feb-2008  matt branches: 1.123.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.
 1.122 06-Dec-2007  dyoung Use ifa_insert(), ifa_remove().
 1.121 05-Dec-2007  dyoung Extract common code, creating a subroutine if_purgeaddrs(ifp,
family, purgeaddr) which applies function `purgeaddr' to each
address on `ifp' belonging to `family'.
 1.120 05-Dec-2007  dyoung Use IFADDR_FIRST() and IFADDR_NEXT().
 1.119 09-Nov-2007  dyoung branches: 1.119.2;
KNF. Remove superfluous casts and parentheses.
 1.118 01-Sep-2007  dyoung branches: 1.118.4; 1.118.6;
Use ifreq_setaddr(), ifreq_getaddr(), sockaddr_in_init(), and
sockaddr_copy(). Constify. Compare pointers with NULL, not 0.
Don't "test truth" of pointers, but compare with NULL.
 1.117 15-Apr-2007  dyoung branches: 1.117.2; 1.117.6; 1.117.8;
Cosmetic: shorten a staircase. bzero -> memset. KNF.
 1.116 04-Mar-2007  christos branches: 1.116.2; 1.116.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.115 16-Nov-2006  christos branches: 1.115.4;
__unused removal on arguments; approved by core.
 1.114 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.113 25-Oct-2006  elad Kill some KAUTH_GENERIC_ISSUSER.
 1.112 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.111 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.110 23-Sep-2006  elad PR/29766: Chris Ross: Incomplete correction of comments in netinet/in.c
Patch applied, thanks!
 1.109 23-Jul-2006  ad branches: 1.109.4; 1.109.6;
Use the LWP cached credentials where sane.
 1.108 14-May-2006  elad integrate kauth.
 1.107 10-May-2006  mrg quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..
 1.106 11-Dec-2005  christos branches: 1.106.4; 1.106.6; 1.106.8; 1.106.10; 1.106.12;
merge ktrace-lwp.
 1.105 28-Sep-2005  seanb - Close NULL dereference when a GIFALIAS is performed on
a non existant address.
- Code review: christos
 1.104 26-Feb-2005  perry branches: 1.104.2; 1.104.4; 1.104.6;
nuke trailing whitespace
 1.103 03-Feb-2005  perry ANSIfy function prototypes. (Still have about 3/5ths of the C files in
netinet to go...)
 1.102 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.101 24-Jan-2005  matt branches: 1.101.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.100 08-Aug-2004  yamt branches: 1.100.4;
in_control: fix address leaks on error, which causes a panic
("no domain for AF 0") on if_detach.
- SIOCAIFADDR, SIOCSIFADDR: free an address on error.
- SIOCSIFNETMASK, SIOCSIFDSTADDR: reject operations for an interface which
has no AF_INET addresses.

partly from OpenBSD and FreeBSD.
reviewed by Christos Zoulas on tech-net@.
 1.99 26-Jul-2004  yamt call PFIL_NEWIF hooks at a correct place.
(on SIOCAIFADDR rather than SIOCGIFALIAS.)

from Peter Postma, PR/26402.
ok'ed by itojun.
 1.98 18-Jul-2004  yamt fix typos. PFIL_HOOK -> PFIL_HOOKS
 1.97 07-Jul-2004  mycroft Fix SIOCSIFNETMASK -- it needs to use in_ifscrub() and in_ifinit() to update
the interface route and various internal state. Also, it should use an ifreq,
not an if_aliasreq. Addresses PR 9604. (Nothing in our source tree uses
SIOCSIFNETMASK, though. Perhaps it should be deprecated.)
 1.96 22-Jun-2004  itojun prepare PF-related hooks. reviewed by matt, perry, christos
 1.95 30-May-2004  itojun fix SIOC*LIFADDR for IPv4. markus friedl
 1.94 21-Apr-2004  itojun kill sprintf, use snprintf
 1.93 11-Nov-2003  jonathan branches: 1.93.2;
Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.
 1.92 23-Oct-2003  mycroft Remove all the code to maintain ia_inpcbs. This information was only used to
close sockets on address changes, which was deemed to be a bad idea and was
summarily removed, so there is no point in wasting effort on maintaining it
any more.
 1.91 16-Aug-2003  itojun do not disconnect L4 connections on IP address removal. the behavior
is too extreme (consider DHCP/PPP-based fixed address allocation).
see tech-net for more info.
 1.90 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.89 26-Jun-2003  itojun branches: 1.89.2;
cleanup multicast routing stuff on if_detach().
XXX sideeffect to running instance of multicast routing daemon unknown
 1.88 26-Jun-2003  itojun put meaningful count into in_multientries.
(or we could remove this variable - noone seem to use it)
 1.87 26-Jun-2003  itojun purge rti structure (in igmp.c) for removed ifp on if_detach().
 1.86 26-Jun-2003  itojun tabify
 1.85 18-Jun-2003  itojun install host route for p2p interface even if there's connected net route
by broadcast interface. PR 21903.
 1.84 15-Jun-2003  matt Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.83 13-Jun-2003  onoe For loopback interface, assign ia_dstaddr instead of just changing reference
to ifa_dstaddr. This fixes the problem that assigning more than 2 IPv4
aliases to loopback interface fails to create routing table entry.
 1.82 07-Nov-2002  thorpej Fix signed/unsigned comparison warnings.
 1.81 22-Oct-2002  simonb "newifaddr" in in_control() was set but never used, remove it.
 1.80 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.79 04-Sep-2002  itojun backout 1.78, ioctl(SIOCSIFADDR) is needed to test if the interface
supports AF_INET or not
 1.78 04-Sep-2002  itojun avoid SIOCSIFADDR if there's an IPv4 address already.
the comment doesn't match the behavior, it seems that the code assumed that
there's only one IPv4 address on an interface. sync w/kame
 1.77 09-Jun-2002  itojun whitespace
 1.76 09-May-2002  itojun branches: 1.76.2; 1.76.4;
backout 1.72. it is not correct for the kernel to remove routes by itself,
and the code was buggy (dereferenced null pointer when IFAFREE removes the
route).
 1.75 30-Mar-2002  itojun do not consider /32 address itself as broadcast.
with /32 address, in_addr == in_broadaddr.
 1.74 01-Mar-2002  thorpej In in_savemkludge() and in_restoremkludge(), don't insert into a new
list without removing from the old one first.

From Matt Thomas.
 1.73 21-Feb-2002  christos Sean amended his patch not to include the IFAFREE()
 1.72 21-Feb-2002  christos PR/15662: Sean Boudreau: make sure we clean all routes of an interface when
we change its ip address.
 1.71 13-Nov-2001  lukem add RCSIDs
 1.70 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.69 16-Sep-2001  martin branches: 1.69.2;
Fix typo in comment.
 1.68 27-Jul-2001  itojun branches: 1.68.2;
do not check in_dstaddr on in_{add,scrub}prefix, otherwise linklocal
address manipulation could choke. sync with kame
 1.67 22-Jul-2001  itojun manage IFA_ROUTE on interface address better, so that we can
provide a better support for multiple address with the same prefix better.
(like 10.0.0.1/8 and 10.0.0.2/8 on the same interface)
continuation of PR 13311.

remove irrelevant #if 0'ed segment for PR 10427.
 1.66 13-Apr-2001  thorpej branches: 1.66.2;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.65 08-Oct-2000  enami branches: 1.65.2;
- Keep track of allhost multicast address record we joined into
each in_ifaddr and delete it when an address is purged.
- Don't simply try to delete a multicast address record listed in the
ia_multiaddrs. It results a dangling pointer. Let who holds a
reference to it to delete it.
 1.64 08-Oct-2000  itojun implement multicast kludge table for IPv4.
- when all the interface address is removed from an interface, and there's
multicast groups still left joined, keep it in kludge table.
- when an interface address is added again, recover multicast groups from
kludge table.
this will avoid problem with dangling in_ifaddr on pcmcia card removal,
due to the link from multicast group info (in_multi).

the code is basically from sys/netinet6/in6.c (jinmei@kame).

pointed out by: Shiva Shenoy <shiva_s@yahoo.com>
 1.63 06-Oct-2000  itojun remove obsolete handling code for SIOCSIFPHY*. they are now in ifioctl().
sync with kame.
 1.62 02-Aug-2000  itojun inhibit error code from rtinit(). this happens when we try to assign
multiple addresses from same prefix, onto single interface. PR 10427.


more info:
- 4.4BSD did not check return code from in_ifinit() at all.
4.4BSD does not support multiple address from same prefix.
- past KAME change passed in{,6}_ifinit() to upwards, toward ifconfig(8).
the behavior is filed as PR 10427.
- the commit inhibits EEXIST from rtinit(), hence partially recovers old
4.4BSD behavior.
- the right thing to happen is to properly support multiple address assignment
from the same prefix. KAME tree has more extensive change, however, it needs
much more time to get stabilized (rtentry refcnt change can cause serious
issue, we really need to bake it before bring it to netbsd)
 1.61 06-May-2000  mycroft branches: 1.61.4;
GC in_interfaces.
 1.60 03-Apr-2000  enami Bump the reference count of ifaddr while it is refered through in_multi.
 1.59 30-Mar-2000  augustss Remove register declarations.
 1.58 21-Mar-2000  itojun improve comment (about undo'ing code on in{,6}_ifinit failure)
 1.57 18-Mar-2000  itojun #if 0'ed undo code for interface address addition failure.
it was a bit too strong, and forbids multiple addresses from
same prefix to be assigned.

now the behavior is the same as previous - memory leak on interface address
addition failure.
http://orange.kame.net/dev/query-pr.cgi?pr=218
 1.56 12-Mar-2000  itojun undo interface address addition attempt, when in_ifinit fails.
(this basically avoids memory leakage)
 1.55 06-Mar-2000  itojun allow SIOCDIFADDR with AF_UNSPEC address by default, until we fix ifconfig(8).
(should be COMPAT_43)
 1.54 25-Feb-2000  itojun allow AF_UNSPEC for SIOCDIFADDR. ISC DHCP client depends on this behavior.
 1.53 25-Feb-2000  itojun backout previous commit (sanity check for family) - it seems to be doing
something wrong. i'll revise it soon.
 1.52 25-Feb-2000  itojun reject non-AF_INET addresses on ioctl.
without this, we can configure invalid sockaddrs, for example,
sa_family == 0 (and we can never remove them!)
 1.51 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.50 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.49 12-Dec-1999  itojun do not use member of sockaddr_storage directly.
(if the definition is like in rfc2553) they are not supposed to be used.

XXX i'm trying to change rfc2553 sockaddr_storage definition to include
"ss_len" and "ss_family". see ipngwg. situation might change soon.
 1.48 01-Jul-1999  itojun branches: 1.48.2; 1.48.8;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.47 26-Jun-1999  sommerfeld If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL
 1.46 19-Dec-1998  thorpej branches: 1.46.4; 1.46.6;
Reverse the copyright-notice-swap. It went against existing practice.
 1.45 30-Sep-1998  tls branches: 1.45.4;
Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.
 1.44 28-Sep-1998  christos SIOCGIFALIAS should not be restricted to the superuser.
 1.43 06-Sep-1998  christos Add SIOCGIFALIAS
 1.42 05-Jul-1998  jonathan Garbage-collect ``needs-flag'' from attributes ether, fddi, arc:
NETHER, NFDDI, NARC are not used anywhere. Remove #include "ether.h",
which had no effect.
Removes clash with "options NATM" for native-ATM network protocol stack.
 1.41 05-Jul-1998  jonathan defopt INET, NETATALK.
 1.40 29-May-1998  matt Change arp so its console log messages print out IP addresses in
dotted quad format instead of hex.
 1.39 15-Feb-1998  tls Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.
 1.38 13-Feb-1998  tls Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.
 1.37 12-Jan-1998  scottr Use option header file for MROUTING
 1.36 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.35 23-Jul-1997  thorpej branches: 1.35.6;
Pull SYN_cache_branch down into the main line.
 1.34 15-Mar-1997  is branches: 1.34.2;
New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.33 14-Sep-1996  mrg branches: 1.33.4;
move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.
 1.32 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.31 07-Sep-1996  mrg fix a couple of minor nits after discussions with jason.
 1.30 06-Sep-1996  mrg add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.
 1.29 23-Jun-1996  mycroft Return ENOPROTOOPT rather than picking pseudo-random error values.
Don't allow SIOCGET{VIF,SG}CNT from sockets other than the multicast router.
Restructure rip_ctloutput() like ip_ctloutput(), and fix memory leaks.
 1.28 22-May-1996  mycroft A few style changes to match netiso and netns.
 1.27 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.26 13-Feb-1996  christos branches: 1.26.4;
netinet prototypes
 1.25 12-Aug-1995  mycroft splnet --> splsoftnet
 1.24 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.23 04-Jun-1995  mycroft For consistency, set sin_len for SIOC{ADD,DEL}MULTI.
 1.22 04-Jun-1995  mycroft Clean up many more casts.
 1.21 04-Jun-1995  mycroft Clean up a lot of ugly casts.
 1.20 01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.19 31-May-1995  mycroft Integrate multicast 3.5 distribution, with several bugs fixed and general
cleanup. This is a (working) snapshot of work in progress.
 1.18 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.17 11-Apr-1995  mycroft Remove some explicit references to loif.
 1.16 10-Apr-1995  mycroft Remove now unneeded #ifdef. Prototype new function.
 1.15 03-Nov-1994  mycroft Fix off by one error in in_socktrim(), reported by Karn Fox.
 1.14 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.13 21-Jun-1994  chopps config.new hack for if_ether.c for lack of an `and' in the grammer
and protect some ether specific code in in.c
 1.12 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.11 14-Mar-1994  glass add missing explicit type declaration for func argument
 1.10 10-Feb-1994  mycroft Deprecate af.h.
 1.9 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.8 09-Jan-1994  mycroft Prototype the rest.
 1.7 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.6 18-Dec-1993  mycroft Canonicalize all #includes.
 1.5 06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.4 10-Jun-1993  deraadt branches: 1.4.4;
patch from Yuval Yarom, sent to me by <andrew@werple.apana.org.au>
The check that the destination of a forwarded ip packet is not on
the loopback net is wrong, and will always fail. The following patch
fixes the problem.
[allows "route add $hostname localhost" to be added to /etc/netstart to
keep things for $hostname away from the ethernet driver]
 1.3 22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.4.2 08-Nov-1993  mycroft Remove references to af.h.
 1.4.4.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.26.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.33.4.2 09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.33.4.1 07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.34.2.1 14-May-1997  mellon Keep track of maximum mtu of non-local interface
 1.35.6.1 01-Oct-1998  cgd pull up revisions 1.38-1.39, 1.45 (via patch) from trunk. (tls)
 1.45.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.46.6.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.46.6.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.46.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.46.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.48.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.48.2.2 21-Apr-2001  bouyer Sync with HEAD
 1.48.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.61.4.6 30-Nov-2003  he Pull up revision 1.89 via patch, requested by itojun in ticket #54:
Clean up multicast routing when an interface is detached.
 1.61.4.5 04-Aug-2003  msaitoh Pull up revision 1.87 (requested by itojun in ticket #50):
purge rti structure (in igmp.c) for removed ifp on if_detach().
 1.61.4.4 03-Apr-2002  he Pull up revision 1.75 (requested by itojun):
Reject TCP SYN packets sent to the broadcast address.
 1.61.4.3 17-Oct-2000  tv Pullup 1.65 [enami]:
- Keep track of allhost multicast address record we joined into
each in_ifaddr and delete it when an address is purged.
- Don't simply try to delete a multicast address record listed in the
ia_multiaddrs. It results a dangling pointer. Let who holds a
reference to it to delete it.

Also 1.64 [itojun, req by enami]:
implement multicast kludge table for IPv4.
- when all the interface address is removed from an interface, and there's
multicast groups still left joined, keep it in kludge table.
- when an interface address is added again, recover multicast groups from
kludge table.
this will avoid problem with dangling in_ifaddr on pcmcia card removal,
due to the link from multicast group info (in_multi).
 1.61.4.2 06-Oct-2000  itojun pullup (approved by releng-1-5)
move privilege check for SIOCSIFPHY* from in{,6}_control to ifioctl.
fix privilege check mistakes (which allows non-root user to modify gif
physical address in some cases). sync with kame.
> cvs rdiff -r1.62 -r1.63 syssrc/sys/netinet/in.c
> cvs rdiff -r1.34 -r1.35 syssrc/sys/netinet6/in6.c
> cvs rdiff -r1.71 -r1.73 syssrc/sys/net/if.c
 1.61.4.1 18-Aug-2000  itojun pullup (approved by releng-1-5)
sys/netinet6/in6.c 1.33 -> 1.34
sys/netinet/in.c 1.61 -> 1.62

> inhibit error code from rtinit(). this happens when we try to assign
> multiple addresses from same prefix, onto single interface. PR 10427.
 1.65.2.10 11-Nov-2002  nathanw Catch up to -current
 1.65.2.9 17-Sep-2002  nathanw Catch up to -current.
 1.65.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.65.2.7 17-Apr-2002  nathanw Catch up to -current.
 1.65.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.65.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.65.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.65.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.65.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.65.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.66.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.66.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.66.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.66.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.66.2.1 03-Aug-2001  lukem update to -current
 1.68.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.69.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.76.4.4 02-Apr-2006  riz Pull up following revision(s) (requested by seanb in ticket #6040):
sys/netinet/in.c: revision 1.105
- Close NULL dereference when a GIFALIAS is performed on
a non existant address.
- Code review: christos
 1.76.4.3 02-Jul-2003  tron Apply patch (requested by itojun in ticket #1361):
Make it compilable.
 1.76.4.2 30-Jun-2003  grant Pull up revision 1.89 (requested by itojun in ticket #1342):

cleanup multicast routing stuff on if_detach().
 1.76.4.1 30-Jun-2003  grant Pull up revision 1.87 (requested by itojun in ticket #1341):

purge rti structure (in igmp.c) for removed ifp on if_detach().
 1.76.2.1 20-Jun-2002  gehenna catch up with -current.
 1.89.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.89.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.89.2.4 04-Feb-2005  skrll Sync with HEAD.
 1.89.2.3 12-Aug-2004  skrll Sync with HEAD.
 1.89.2.2 05-Aug-2004  skrll Fix merge mistakes.
 1.89.2.1 03-Aug-2004  skrll Sync with HEAD
 1.93.2.2 02-Apr-2006  riz Pull up following revision(s) (requested by seanb in ticket #10411):
sys/netinet/in.c: revision 1.105
- Close NULL dereference when a GIFALIAS is performed on
a non existant address.
- Code review: christos
 1.93.2.1 10-Jul-2004  tron branches: 1.93.2.1.2; 1.93.2.1.4;
Pull up revision 1.97 (requested by mycroft in ticket #615):
Fix SIOCSIFNETMASK -- it needs to use in_ifscrub() and in_ifinit() to update
the interface route and various internal state. Also, it should use an ifreq,
not an if_aliasreq. Addresses PR 9604. (Nothing in our source tree uses
SIOCSIFNETMASK, though. Perhaps it should be deprecated.)
 1.93.2.1.4.1 02-Apr-2006  riz Pull up following revision(s) (requested by seanb in ticket #10411):
sys/netinet/in.c: revision 1.105
- Close NULL dereference when a GIFALIAS is performed on
a non existant address.
- Code review: christos
 1.93.2.1.2.1 02-Apr-2006  riz Pull up following revision(s) (requested by seanb in ticket #10411):
sys/netinet/in.c: revision 1.105
- Close NULL dereference when a GIFALIAS is performed on
a non existant address.
- Code review: christos
 1.100.4.1 29-Apr-2005  kent sync with -current
 1.101.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.101.2.1 12-Feb-2005  yamt sync with head.
 1.104.6.1 02-Apr-2006  riz Pull up following revision(s) (requested by seanb in ticket #1235):
sys/netinet/in.c: revision 1.105
- Close NULL dereference when a GIFALIAS is performed on
a non existant address.
- Code review: christos
 1.104.4.6 11-Feb-2008  yamt sync with head.
 1.104.4.5 07-Dec-2007  yamt sync with head
 1.104.4.4 15-Nov-2007  yamt sync with head.
 1.104.4.3 03-Sep-2007  yamt sync with head.
 1.104.4.2 30-Dec-2006  yamt sync with head.
 1.104.4.1 21-Jun-2006  yamt sync with head.
 1.104.2.1 02-Apr-2006  riz Pull up following revision(s) (requested by seanb in ticket #1235):
sys/netinet/in.c: revision 1.105
- Close NULL dereference when a GIFALIAS is performed on
a non existant address.
- Code review: christos
 1.106.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.106.10.4 11-May-2006  elad sync with head
 1.106.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.106.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.106.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.106.8.2 11-Aug-2006  yamt sync with head
 1.106.8.1 24-May-2006  yamt sync with head.
 1.106.6.1 01-Jun-2006  kardel Sync with head.
 1.106.4.1 09-Sep-2006  rpaulo sync with head
 1.109.6.2 10-Dec-2006  yamt sync with head.
 1.109.6.1 22-Oct-2006  yamt sync with head
 1.109.4.1 18-Nov-2006  ad Sync with head.
 1.115.4.2 15-Apr-2007  yamt sync with head.
 1.115.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.116.4.1 11-Jul-2007  mjf Sync with head.
 1.116.2.2 09-Oct-2007  ad Sync with head.
 1.116.2.1 08-Jun-2007  ad Sync with head.
 1.117.8.3 23-Mar-2008  matt sync with HEAD
 1.117.8.2 09-Jan-2008  matt sync with HEAD
 1.117.8.1 06-Nov-2007  matt sync with HEAD
 1.117.6.3 09-Dec-2007  jmcneill Sync with HEAD.
 1.117.6.2 11-Nov-2007  joerg Sync with HEAD.
 1.117.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.117.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.118.6.3 18-Feb-2008  mjf Sync with HEAD.
 1.118.6.2 08-Dec-2007  mjf Sync with HEAD.
 1.118.6.1 19-Nov-2007  mjf Sync with HEAD.
 1.118.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.119.2.1 08-Dec-2007  ad Sync with head.
 1.123.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.123.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.123.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.124.4.4 11-Mar-2010  yamt sync with head
 1.124.4.3 16-Sep-2009  yamt sync with head
 1.124.4.2 04-May-2009  yamt sync with head.
 1.124.4.1 16-May-2008  yamt sync with head.
 1.124.2.1 18-May-2008  yamt sync with head.
 1.125.2.2 10-Oct-2008  skrll Sync with HEAD.
 1.125.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.126.4.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.126.4.1 19-Oct-2008  haad Sync with HEAD.
 1.127.4.4 22-Aug-2012  bouyer Pull up following revision(s) (requested by gdt in ticket #1776):
sys/netinet/in.c: revision 1.143
Simply use the ifa_addr pointer, rather than taking its address.
Resolves failure to match addresses in SIOC[GD]LIFADDR calls.
Diagnosis and fix is due to Mark Keaton of BBN.
 1.127.4.3 20-May-2010  snj Revert ticket 1357.
 1.127.4.2 29-Mar-2010  snj Pull up following revision(s) (requested by bouyer in ticket #1357):
sys/netinet/in.c: revision 1.137 via patch
Fixed a number of race conditions in the case of receiving ipv4 packet.
found by iij seil team.
 1.127.4.1 09-Jan-2009  snj branches: 1.127.4.1.4;
Pull up following revision(s) (requested by roy in ticket #239):
sys/netinet/in.c: revision 1.130
The automatic addition of a subnet route should not error if a manually
added route already exists. Fixes PR kern/40133.
 1.127.4.1.4.1 21-Apr-2010  matt sync to netbsd-5
 1.127.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.127.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.127.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.130.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.136.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.136.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.137.2.1 30-May-2010  rmind sync with head
 1.140.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.140.2.2 30-Oct-2012  yamt sync with head
 1.140.2.1 17-Apr-2012  yamt sync with head
 1.141.2.1 18-Feb-2012  mrg merge to -current.
 1.142.2.1 13-Jun-2012  riz Pull up following revision(s) (requested by gdt in ticket #330):
sys/netinet/in.c: revision 1.143
Simply use the ifa_addr pointer, rather than taking its address.
Resolves failure to match addresses in SIOC[GD]LIFADDR calls.
Diagnosis and fix is due to Mark Keaton of BBN.
 1.143.4.2 28-Aug-2013  rmind sync with head
 1.143.4.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.143.2.2 03-Dec-2017  jdolecek update from HEAD
 1.143.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.144.4.1 10-Aug-2014  tls Rebase.
 1.147.2.1 14-Apr-2015  snj Pull up following revision(s) (requested by christos in ticket #689):
sys/netinet/in.c: revision 1.149
Don't pass junk in sin_family and sin_len for SIOCGIFNETMASK, and explain why.
XXX: pullup 7?
 1.148.2.12 28-Aug-2017  skrll Sync with HEAD
 1.148.2.11 05-Feb-2017  skrll Sync with HEAD
 1.148.2.10 05-Dec-2016  skrll Sync with HEAD
 1.148.2.9 05-Oct-2016  skrll Sync with HEAD
 1.148.2.8 09-Jul-2016  skrll Sync with HEAD
 1.148.2.7 29-May-2016  skrll Sync with HEAD
 1.148.2.6 22-Apr-2016  skrll Sync with HEAD
 1.148.2.5 19-Mar-2016  skrll Sync with HEAD
 1.148.2.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.148.2.3 22-Sep-2015  skrll Sync with HEAD
 1.148.2.2 06-Jun-2015  skrll Sync with HEAD
 1.148.2.1 06-Apr-2015  skrll Sync with HEAD
 1.175.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.175.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.175.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.175.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.175.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.195.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.199.4.2 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.199.4.1 02-May-2017  pgoyette Sync with HEAD - tag prg-localcount2-base1
 1.203.2.14 08-Oct-2020  martin Pull up following revision(s) (requested by roy in ticket #1613):

sys/netinet/in.c: revision 1.241
sys/netinet6/in6.c: revision 1.282

inet: Treat LINK_STATE_UNKNOWN as LINK_STATE_UP when changing

It's something we have always done.
it's really rare for anything to transition to UNKNOWN from either
UP or DOWN, but technically it is possible.
 1.203.2.13 09-Apr-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #701):
sys/netinet/in.c: 1.228
Protect ip_dad_count with if NARP > 0 to fix compilation
 1.203.2.12 08-Apr-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #701):
sys/netinet/in.c: 1.227
sys/netinet6/in6.c: 1.265
tests/net/arp/t_arp.sh: 1.35-1.36
Make GARP work again when DAD is disabled
The change avoids setting an IP address tentative on initializing it when
the IPv4 DAD is disabled (net.inet.ip.dad_count=0), which allows a GARP packet
to be sent (see arpannounce). This is the same behavior of NetBSD 7, i.e.,
before introducing the IPv4 DAD.
Additionally do the same change to IPv6 DAD for consistency.
The change is suggested by roy@
--
Improve packet checks and error reporting
--
Add tests for GARP without DAD
Additionally make the existing tests for GARP more explicit.
 1.203.2.11 13-Mar-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #622):
sys/netinet/if_arp.c: revision 1.270
sys/net/if_llatbl.c: revision 1.24 (patch)
sys/net/if_llatbl.c: revision 1.25
sys/net/if_llatbl.c: revision 1.26
sys/net/route.c: revision 1.204
sys/netinet6/in6.c: revision 1.261
sys/netinet6/in6.c: revision 1.262 (patch)
sys/netinet6/in6.c: revision 1.263
sys/netinet/in.c: revision 1.216
sys/netinet6/in6.c: revision 1.264
sys/netinet6/nd6.c: revision 1.246 (patch)
sys/netinet/if_arp.c: revision 1.269
sys/net/if_llatbl.h: revision 1.14
sys/netinet6/in6.c: revision 1.259
sys/netinet/in.c: revision 1.220
sys/netinet/in.c: revision 1.221 (patch)
sys/netinet/in.c: revision 1.222
sys/netinet/in.c: revision 1.223

Suppress noisy debugging outputs
Even if DEBUG they are too noisy under load.

Tweak sanity checks

Scheduling a timer of static entries is wrong.

Add assertions

We must not destroy llentries holding mbufs.

Fix reference leaks of llentry
callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).
While here, we can remove remaining abuses of mutex_owned for softnet_lock.

Fix memory leaks on arp -d and ndp -d for static entries
We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.

Use pool(9) for llentry allocations
llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.203.2.10 26-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #588):
sys/netinet6/in6.c: revision 1.260
sys/netinet/in.c: revision 1.219
sys/netinet/wqinput.c: revision 1.4
sys/rump/net/lib/libnetinet/netinet_component.c: revision 1.11
sys/netinet/ip_input.c: revision 1.376
sys/netinet6/ip6_input.c: revision 1.193
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043
 1.203.2.9 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #539):
sys/netinet/in.c: 1.217
Don't call lltable_purge_entries from in_if_down if ARP isn't enabled
Reported by bouyer@
 1.203.2.8 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.203.2.7 03-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #514):
sys/net/route.c: 1.205
sys/net/rtsock.c: 1.237-1.238
sys/netinet/in.c: 1.215
sys/netinet/tcp_subr.c: 1.272
sys/netinet/tcp_timer.c: 1.93
sys/netinet/tcp_timer.h: 1.29
sys/netinet/tcp_var.h: 1.182
sys/netinet6/in6.c: 1.258
Remove extra pserialize_perform from in_purgeaddr
It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too).
Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr
The deadlock happened only if NET_MPSAFE on.
Run tcp_slowtimo in workqueue if NET_MPSAFE
If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.
NFCI for !NET_MPSAFE
Fix a return value of rt_update_prepare
Callers expect it to be an errno.
Fix another deadlock
When waiting for a route update to finish, a waiter has to release its reference
to the route to avoid a deadlock. Because a updater tries to wait for references
to a target route (except for a reference by the updater itself) to be released.
 1.203.2.6 13-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #488):
sys/netinet/in.c: revision 1.213
Don't pass rwlock to callout_halt
 1.203.2.5 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #463):
sys/netinet/in.c: revision 1.212
sys/netinet/ip_output.c: revision 1.288
sys/netinet6/in6.c: revision 1.256
sys/netinet6/in6_pcb.c: revision 1.163
sys/sys/lwp.h: revision 1.176
Add missing curlwp_bindx
--
Add missing curlwp_bindx
--
Check LP_BOUND is surely set in curlwp_bindx
This may find an extra call of curlwp_bindx.
--
Fix usage of curlwp_bind in ip_output
curlwp_bindx must be called in LIFO order, i.e., we can't call curlwp_bind
and curlwp_bindx like this:
bound1 = curlwp_bind();
bound2 = curlwp_bind();
curlwp_bindx(bound1);
curlwp_bindx(bound2);
ip_outout did so if NET_MPSAFE. Fix it.
--
Fix wrong usage of psref_held
We can't use it for checking if a caller does NOT hold a given target.
If you want to do it you should have psref_not_held or something.
 1.203.2.4 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.203.2.3 21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.203.2.2 17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #353):
sys/net/if_llatbl.c: 1.22
sys/net/if_llatbl.h: 1.13
sys/netinet/if_arp.c: 1.254
sys/netinet/in.c: 1.208-1.209
sys/netinet6/in6.c: 1.249-1.250
sys/netinet6/nd6.c: 1.237
Remove redundant KASSERTMSG
The function is static, has just one caller and the caller does the same check.
--
Fix a deadlock between a route update and lltable
It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.
If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.
A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update
Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.
Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.203.2.1 07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.219.2.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.219.2.6 21-May-2018  pgoyette Sync with HEAD
 1.219.2.5 02-May-2018  pgoyette Synch with HEAD
 1.219.2.4 22-Apr-2018  pgoyette Sync with HEAD
 1.219.2.3 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.219.2.2 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.219.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.231.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.231.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.231.2.1 10-Jun-2019  christos Sync with HEAD
 1.234.2.2 24-Aug-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #1883):

tests/net/arp/t_dad.sh: revision 1.16
sys/netinet/in.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.46
sys/netinet/if_arp.c: revision 1.314

arp: fix the behavior on detecting an address duplication without IPv4 DAD

On receiving an ARP request that has the same source protocol address as
the own address, i.e., address duplication, the original behavior of
a kernel prior to supporing IPv4 DAD is to send an ARP reply. It is
the same with a latest kernel with DAD enabled. However, a latest
kernel without DAD sends back an GARP packet. Restore the original
behavior.

inet: send GARP on link up if DAD is disabled

This behavior was accidentally removed at rev 1.233.

tests, arp: add tests of address duplications without DAD

tests, arp: add tests for GARP on link up
 1.234.2.1 08-Oct-2020  martin Pull up following revision(s) (requested by roy in ticket #1104):

sys/netinet/in.c: revision 1.241
sys/netinet6/in6.c: revision 1.282

inet: Treat LINK_STATE_UNKNOWN as LINK_STATE_UP when changing

It's something we have always done.
it's really rare for anything to transition to UNKNOWN from either
UP or DOWN, but technically it is possible.
 1.247.8.1 02-Aug-2025  perseant Sync with HEAD
 1.247.2.1 24-Aug-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #812):

tests/net/arp/t_dad.sh: revision 1.16
sys/netinet/in.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.46
sys/netinet/if_arp.c: revision 1.314

arp: fix the behavior on detecting an address duplication without IPv4 DAD

On receiving an ARP request that has the same source protocol address as
the own address, i.e., address duplication, the original behavior of
a kernel prior to supporing IPv4 DAD is to send an ARP reply. It is
the same with a latest kernel with DAD enabled. However, a latest
kernel without DAD sends back an GARP packet. Restore the original
behavior.

inet: send GARP on link up if DAD is disabled

This behavior was accidentally removed at rev 1.233.

tests, arp: add tests of address duplications without DAD

tests, arp: add tests for GARP on link up
 1.115 16-Jun-2023  rin White space fixes. No binary changes.
 1.114 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.113 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.112 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.111 08-Sep-2020  christos branches: 1.111.2;
Add IP_BINDANY, IPV6_BINDANY which can be used to bind to any address in
order to implement transparent proxies.
 1.110 20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.109 18-Dec-2019  roy inet: Add support for IPv4 /31 prefixes, as described in RFC 3021.

To run a /31 network, participating hosts MUST drop support
for directed broadcasts, and treat the first and last addresses
on subnet as unicast. The broadcast address for the prefix
should be the link local broadcast address, INADDR_BROADCAST.

Taken from FreeBSD, r226402.
Fixes PR kern/51388.
 1.108 09-Nov-2018  maya Use the same type redefinition guards as stdint.h since rev1.8

PR pkg/53713
 1.107 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.106 11-Jul-2018  maxv Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.
 1.105 19-Apr-2018  christos branches: 1.105.2;
s/static inline/static __inline/g for consistency.
 1.104 09-Feb-2018  maxv branches: 1.104.2;
Remove dead code.
 1.103 10-Jan-2018  knakahara add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.102 01-Jan-2018  christos 1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo
 1.101 10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.100 16-Feb-2017  knakahara branches: 1.100.6;
add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.99 01-Aug-2016  ozaki-r branches: 1.99.2;
Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.98 13-Oct-2015  rjs branches: 1.98.2;
Add core networking support for SCTP.
 1.97 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.96 10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.95 02-Dec-2014  christos use the new printing code.
 1.94 02-Dec-2014  christos add routines to print in_addr and sockaddr_in (in_print and sin_print)
 1.93 12-Oct-2014  christos branches: 1.93.2;
document that we depend on the option numbers matching.
 1.92 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.91 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.90 22-May-2014  rmind - Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.89 27-Jun-2013  christos branches: 1.89.2; 1.89.6;
implement IP_PKTINFO and IP_RECVPKTINFO.
 1.88 27-Apr-2013  joerg Systematically include sys/featuretest.h when _NETBSD_SOURCE is used.
Some are redundant, but make verification with grep much easier.
 1.87 22-Jun-2012  christos branches: 1.87.2;
PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.86 14-Sep-2009  degroote branches: 1.86.12;
Import pfsync support from OpenBSD 4.2

Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@
 1.85 17-Jul-2009  minskim Add the IP_MINTTL socket option.

The IP_MINTTL option may be used on SOCK_STREAM sockets to discard
packets with a TTL lower than the option value. This can be used to
implement the Generalized TTL Security Mechanism (GTSM) according to
RFC 3682.

OK'ed by christos@.
 1.84 16-Jul-2009  minskim Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.
 1.83 25-Jan-2008  joerg branches: 1.83.2; 1.83.10; 1.83.24;
Refactor in_cksum/in4_cksum/in6_cksum implementations:
- All three functions are included in the kernel by default.
They call a backend function cpu_in_cksum after possibly
computing the checksum of the pseudo header.
- cpu_in_cksum is the core to implement the one-complement sum.
The default implementation is moderate fast on most platforms
and provides a 32bit accumulator with 16bit addends for L32 platforms
and a 64bit accumulator with 32bit addends for L64 platforms.
It handles edge cases like very large mbuf chains (could happen with
native IPv6 in the future) and provides a good base for new native
implementations.
- Modify i386 and amd64 assembly to use the new interface.

This disables the MD implementations on !x86 until the conversion is
done. For Alpha, the portable version is faster.
 1.82 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.81 19-Sep-2007  dyoung branches: 1.81.6; 1.81.8; 1.81.12;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.80 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.79 02-May-2007  dyoung branches: 1.79.2; 1.79.6; 1.79.8;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.78 17-Feb-2007  dyoung branches: 1.78.4; 1.78.6;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.77 13-Nov-2006  dyoung branches: 1.77.4;
Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.76 18-May-2006  liamjfoy branches: 1.76.8; 1.76.10;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.75 16-Feb-2006  perry branches: 1.75.2; 1.75.6;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.74 24-Dec-2005  perry branches: 1.74.2; 1.74.4; 1.74.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.73 20-Dec-2005  christos Define INADDR_NONE when we are in the kernel too.
 1.72 11-Dec-2005  christos merge ktrace-lwp.
 1.71 05-Aug-2005  elad Add sysctls for IP, ICMP, TCP, and UDP statistics.
 1.70 31-Jan-2005  kim branches: 1.70.6;
Add RFC 3378 EtherIP support, ported from OpenBSD to NetBSD by
Hans Rosenfeld (rosenfeld at grumpf.hope-2000.org)

This change makes it possible to add gif interfaces to bridges, which
will then send and receive IP protocol 97 packets. Packets are Ethernet
frames with an EtherIP header prepended.
 1.69 15-Dec-2004  thorpej branches: 1.69.2; 1.69.4;
Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.68 04-Sep-2004  manu IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
 1.67 07-May-2004  jonathan Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.66 21-Apr-2004  itojun no space between function name and paren: foo (blah) -> foo(blah)
 1.65 18-Apr-2004  matt De __P()
 1.64 19-Nov-2003  jonathan branches: 1.64.2;
Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.
 1.63 10-Nov-2003  jonathan Allocate sysctl oid for ipv4 sysctl node "ifq", define symbolic name, and
bump IPCTL_MAXID. (Should have been committed with other ifq sysctl changes).
 1.62 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.61 28-Apr-2003  bjh21 branches: 1.61.2;
Add a new feature-test macro, _NETBSD_SOURCE. If this is defined
by the application, all NetBSD interfaces are made visible, even
if some other feature-test macro (like _POSIX_C_SOURCE) is defined.
<sys/featuretest.h> defined _NETBSD_SOURCE if none of _ANSI_SOURCE,
_POSIX_C_SOURCE and _XOPEN_SOURCE is defined, so as to preserve
existing behaviour.

This has two major advantages:
+ Programs that require non-POSIX facilities but define _POSIX_C_SOURCE
can trivially be overruled by putting -D_NETBSD_SOURCE in their CFLAGS.
+ It makes most of the #ifs simpler, in that they're all now ORs of the
various macros, rather than having checks for (!defined(_ANSI_SOURCE) ||
!defined(_POSIX_C_SOURCE) || !defined(_XOPEN_SOURCE)) all over the place.

I've tried not to change the semantics of the headers in any case where
_NETBSD_SOURCE wasn't defined, but there were some places where the
current semantics were clearly mad, and retaining them was harder than
correcting them. In particular, I've mostly normalised things so that
_ANSI_SOURCE gets you the smallest set of stuff, then _POSIX_C_SOURCE,
_XOPEN_SOURCE and _NETBSD_SOURCE in that order.

Tested by building for vax, encouraged by thorpej, and uncontested in
tech-userlevel for a week.
 1.60 12-Apr-2003  dogcow PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.
 1.59 27-Jan-2003  kleink C++ does not permit static a data member to have the same name as its
class, so in a C++ environment rename the ip_opts member to Ip_opts as
observed in several other implementations; from Jon Olsson in
PR toolchain/19880.
 1.58 13-May-2002  kleink branches: 1.58.4;
Define uint{8,32}_t locally, per XNS5.2/POSIX-2001, and use them in this
header where applicable; use private fixed-width integer types otherwise.
 1.57 12-May-2002  kleink Provide local definitions of in_{addr,port}_t in <netinet/in.h> and use
them where deemed appropriate by XNS5.2/POSIX-2001.
 1.56 24-Feb-2002  martin Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.
 1.55 02-Jun-2001  thorpej branches: 1.55.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.54 27-May-2001  itojun typo in comment
 1.53 27-Mar-2001  itojun net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.
 1.52 19-Jan-2001  kleink branches: 1.52.2;
Add IPPROTO_VRRP.
 1.51 28-Aug-2000  simonb #define<tab> cleanup.
 1.50 25-Aug-2000  tron Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.
 1.49 28-Jul-2000  kleink Avoid recursion with traditional cpp.
 1.48 26-Jun-2000  kleink XNS5.2: define sa_family_t and use it where specified by the standard.
 1.47 10-Mar-2000  itojun branches: 1.47.4;
move IPPROTO_DONE to IPPROTO_xx group
 1.46 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.45 09-Feb-2000  itojun to improve RFC2553/2292 compliance, and promote use of
RFC2553/2292-compliant header file path, now the following headers are
forbidden:
netinet6/ip6.h
netinet6/icmp6.h
netinet6/in6.h

if you want netinet6/{ip6,icmp6}.h, use netinet/{ip6,icmp6}.h.

if you want netinet6/in6.h, you just need to include netinet/in.h.
it pulls it in.
(we may need to integrate them into netinet/in.h, but for cross-BSD code
sharing i'd like to keep it like this for now)
 1.44 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.43 20-Nov-1999  thorpej Add the `packed' attribute to structures which describe wire protocol data.
 1.42 02-Jul-1999  itojun branches: 1.42.2; 1.42.8;
move ipsec sysctl index to IPPROTO_AH (instead of IPPROTO_ESP),
so that you can perform sysctl operation when ESP is not compiled in.
 1.41 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.40 26-Jun-1999  sommerfeld If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL
 1.39 14-Sep-1998  hwr branches: 1.39.6; 1.39.8; 1.39.10;
Typo. :(
 1.38 14-Sep-1998  hwr Some additions.
And IDPR-CMTP is 38 not 39 according to IANA.
 1.37 13-Sep-1998  hwr Add a gre tunnel pseudo network device. Gre = generic route encapsulation.
This device shows up like any other network interface and can be used to
tunnel L3 protocols as e.g. IP over IP.
 1.36 05-Sep-1998  kleink Protect _XOPEN_SOURCE against sysctl MIB identifiers.
 1.35 04-May-1998  matt Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.
 1.34 29-Apr-1998  kml Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.
 1.33 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.32 07-Jan-1998  lukem add the following, derived from FreeBSD:
* IP_PORTRANGE socket option, which controls how the ephemeral ports
are allocated. it takes the following settings:
IP_PORTRANGE_DEFAULT use anonportmin (49152) -> anonportmax (65535)
IP_PORTRANGE_HIGH as IP_PORTRANGE_DEFAULT (retained for FreeBSD
compat reasons, where these are separate)
IP_PORTRANGE_LOW use 600 -> 1023. only works if uid==0.
* in_pcb flag INP_ANONPORT. set if port was allocated ephmerally
 1.31 05-Jan-1998  lukem enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}
 1.30 30-Dec-1997  lukem as per the IANA assigned ports numbers document, use ports
49152..65535 for ephemeral ports (instead of 1024..5000).
closes my [kern/4440], but with correct code :)
 1.29 16-Dec-1997  thorpej Add INADDR_ALLRTRS_GROUP and INADDR_MAX_LOCAL_GROUP.
 1.28 18-Oct-1997  kml branches: 1.28.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc
 1.27 17-Oct-1997  thorpej Allow `subnetsarelocal' to be changed via sysctl.
 1.26 14-Oct-1997  matt Add support for returning maximum supported MTU when ip_output fails with
EMSGSIZE.
 1.25 27-Aug-1997  matt Add IPPROTO_ESP and IPPROTO_AH defines.
 1.24 25-Feb-1997  cjs branches: 1.24.4;
Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.
 1.23 11-Jan-1997  thorpej branches: 1.23.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.
 1.22 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.21 22-May-1996  mycroft A few style changes to match netiso and netns.
 1.20 13-Feb-1996  christos branches: 1.20.4;
netinet prototypes
 1.19 16-Jan-1996  thorpej Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.
 1.18 15-Jan-1996  thorpej Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.
 1.17 04-Jun-1995  mycroft Clean up many more casts.
 1.16 01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.15 31-May-1995  mycroft Add IPPROTO_IP. Fix comment for IP_MULTICAST_IF.
 1.14 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.13 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.12 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.11 14-May-1994  cgd i forgot a four letter word...
 1.10 14-May-1994  cgd multiple inclusion protection, for the rpc headers
 1.9 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8 28-Jan-1994  deraadt need a stub 'struct socket;' for a prototype
 1.7 26-Jan-1994  cgd renumber the IP setsockopt options back to the Reno/Net2 versions,
moving the multicast options after them
From: Mike Karels <karels@BSDI.COM>
(grr.)
 1.6 09-Jan-1994  mycroft Prototype the rest.
 1.5 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.4 06-Dec-1993  hpeyerl multicast patches
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.20.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.23.4.1 12-Mar-1997  is Merge in changes from Trunk
 1.24.4.2 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.24.4.1 28-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.28.2.2 09-May-1998  mycroft Pull up patch from kml.
 1.28.2.1 29-Jan-1998  mellon Pull up 1.29 (thorpej)
 1.39.10.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.39.10.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.39.10.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.39.8.2 02-Aug-1999  thorpej Update from trunk.
 1.39.8.1 01-Jul-1999  thorpej Sync w/ -current.
 1.39.6.1 30-May-2001  he Pull up revision 1.53 (via patch, requested by he):
Introduce net.inet.ip.maxfragpackets, which controls the maximum
number of IPv4 fragment reassembly queue entries. Defends against
certain DoS attacks. Fixes SA#2001-006.
 1.42.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.42.2.3 27-Mar-2001  bouyer Sync with HEAD.
 1.42.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.42.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.47.4.3 26-Feb-2002  he Pull up revision 1.56 (via patch, requested by martin):
Clear M_BCAST and M_MCAST on encapsulated packets on outgoing
mbufs. Also do not copy TTL from the inner packet, and make the
outer TTL sysctl'able. Fixes PR#14269, and makes traceroute work
over GRE tunnels.
 1.47.4.2 24-Apr-2001  he Pull up revision 1.53 (via patch, requested by itojun):
Introduce net.inet.ip.maxfragpackets, which controls the maximum
number of IPv4 fragment reassembly queue entries. Defends against
certain DoS attacks.
 1.47.4.1 26-Aug-2000  tron Pull up from current (approved by thorpej):

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.

syssrc/sys/netinet/in.h 1.49 -> 1.50
syssrc/sys/netinet/in_pcb.c 1.66 -> 1.67
syssrc/sys/netinet/ip_input.c 1.116 -> 1.117
syssrc/sys/netinet/ip_var.h 1.41 -> 1.42
 1.52.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.52.2.3 28-Feb-2002  nathanw Catch up to -current.
 1.52.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.52.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.55.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.55.2.1 16-Mar-2002  jdolecek Catch up with -current.
 1.58.4.1 28-Apr-2003  tron Pull up revision 1.59 (requested by kleink in ticket #1119):
C++ does not permit static a data member to have the same name as its
class, so in a C++ environment rename the ip_opts member to Ip_opts as
observed in several other implementations; from Jon Olsson in
PR toolchain/19880.
 1.61.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.61.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.61.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.61.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.61.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.61.2.1 03-Aug-2004  skrll Sync with HEAD
 1.64.2.1 10-May-2004  tron Pull up revision 1.67 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.69.4.1 12-Feb-2005  yamt sync with head.
 1.69.2.1 29-Apr-2005  kent sync with -current
 1.70.6.7 04-Feb-2008  yamt sync with head.
 1.70.6.6 21-Jan-2008  yamt sync with head
 1.70.6.5 27-Oct-2007  yamt sync with head.
 1.70.6.4 03-Sep-2007  yamt sync with head.
 1.70.6.3 26-Feb-2007  yamt sync with head.
 1.70.6.2 30-Dec-2006  yamt sync with head.
 1.70.6.1 21-Jun-2006  yamt sync with head.
 1.74.6.2 01-Jun-2006  kardel Sync with head.
 1.74.6.1 22-Apr-2006  simonb Sync with head.
 1.74.4.1 09-Sep-2006  rpaulo sync with head
 1.74.2.1 18-Feb-2006  yamt sync with head.
 1.75.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.75.2.1 24-May-2006  yamt sync with head.
 1.76.10.1 10-Dec-2006  yamt sync with head.
 1.76.8.1 18-Nov-2006  ad Sync with head.
 1.77.4.2 07-May-2007  yamt sync with head.
 1.77.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.78.6.1 11-Jul-2007  mjf Sync with head.
 1.78.4.2 09-Oct-2007  ad Sync with head.
 1.78.4.1 08-Jun-2007  ad Sync with head.
 1.79.8.3 23-Mar-2008  matt sync with HEAD
 1.79.8.2 09-Jan-2008  matt sync with HEAD
 1.79.8.1 06-Nov-2007  matt sync with HEAD
 1.79.6.2 02-Oct-2007  joerg Sync with HEAD.
 1.79.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.79.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.81.12.1 02-Jan-2008  bouyer Sync with HEAD
 1.81.8.1 26-Dec-2007  ad Sync with head.
 1.81.6.1 18-Feb-2008  mjf Sync with HEAD.
 1.83.24.1 23-Jul-2009  jym Sync with HEAD.
 1.83.10.3 16-Sep-2009  yamt sync with head
 1.83.10.2 19-Aug-2009  yamt sync with head.
 1.83.10.1 18-Jul-2009  yamt sync with head.
 1.83.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.86.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.86.12.1 30-Oct-2012  yamt sync with head
 1.87.2.3 03-Dec-2017  jdolecek update from HEAD
 1.87.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.87.2.1 23-Jun-2013  tls resync from head
 1.89.6.1 10-Aug-2014  tls Rebase.
 1.89.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.93.2.5 28-Aug-2017  skrll Sync with HEAD
 1.93.2.4 05-Oct-2016  skrll Sync with HEAD
 1.93.2.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.93.2.2 06-Jun-2015  skrll Sync with HEAD
 1.93.2.1 06-Apr-2015  skrll Sync with HEAD
 1.98.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.98.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.99.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.100.6.3 18-Mar-2018  martin Pull up following revision(s) (requested by tih in ticket #639):
sys/kern/uipc_socket.c: revision 1.258
sys/kern/uipc_socket.c: revision 1.259
sys/netinet/ip_input.c: revision 1.364 (via patch)
sys/netinet/ip_output.c: revision 1.289
sys/netinet/in.h: revision 1.102
sys/netinet/in_pcb.c: revision 1.181
share/man/man9/sockopt.9: revision 1.11
sys/netinet/in_pcb.h: revision 1.65
sys/sys/socketvar.h: revision 1.146
sys/kern/uipc_syscalls.c: revision 1.189
sys/netinet/ip_output.c: revision 1.290
share/man/man4/ip.4: revision 1.41
share/man/man4/ip.4: revision 1.42
sys/kern/uipc_syscalls.c: revision 1.190

pass valsize for getsockopt like we do for setsockopt
make sure that we have enough space, don't require the exact size
(Tom Ivar Helbekkmo)

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo

new sentence-new line

Remove comment now that the getsockopt code passes the size.

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).
(Tom Ivar Helbekkmo)
 1.100.6.2 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.100.6.1 21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.104.2.4 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.104.2.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.104.2.2 28-Jul-2018  pgoyette Sync with HEAD
 1.104.2.1 22-Apr-2018  pgoyette Sync with HEAD
 1.105.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.105.2.1 10-Jun-2019  christos Sync with HEAD
 1.111.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.20 30-Nov-2014  christos Only check that the offset < sizeof(struct ip) if nxt != 0, i.e. in the
tcp and udp cases. From kre.
XXX: pullup 7
 1.19 12-Mar-2013  christos branches: 1.19.12; 1.19.14;
- Do the size checks before calling the cpu checksum code. Otherwise you'll
crash there and not panic.
- Don't panic on short packets unless DIAGNOSTIC. In general we should try
to make the kernel survive errors...
 1.18 25-Apr-2011  yamt branches: 1.18.4; 1.18.14;
fix assertions
 1.17 12-Feb-2008  joerg branches: 1.17.32; 1.17.38;
Explicitly predict panic conditions as false.
 1.16 07-Feb-2008  joerg Reimplement in4_cksum to not copy data, but sum up directly.
Tested on sparc and m68k by martin@.
 1.15 25-Jan-2008  joerg Refactor in_cksum/in4_cksum/in6_cksum implementations:
- All three functions are included in the kernel by default.
They call a backend function cpu_in_cksum after possibly
computing the checksum of the pseudo header.
- cpu_in_cksum is the core to implement the one-complement sum.
The default implementation is moderate fast on most platforms
and provides a 32bit accumulator with 16bit addends for L32 platforms
and a 64bit accumulator with 32bit addends for L64 platforms.
It handles edge cases like very large mbuf chains (could happen with
native IPv6 in the future) and provides a good base for new native
implementations.
- Modify i386 and amd64 assembly to use the new interface.

This disables the MD implementations on !x86 until the conversion is
done. For Alpha, the portable version is faster.
 1.14 04-Mar-2007  tsutsui branches: 1.14.16; 1.14.22;
Pass (char *) to mtod(9) on address calculation.
 1.13 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.12 11-Dec-2005  christos branches: 1.12.26;
merge ktrace-lwp.
 1.11 03-Feb-2005  perry branches: 1.11.6;
ANSIfy function prototypes. (Still have about 3/5ths of the C files in
netinet to go...)
 1.10 07-Aug-2003  agc branches: 1.10.8; 1.10.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.9 29-Jul-2002  itojun branches: 1.9.6;
be friendly with gcc-3.1.1 -O2, which takes advantage of ANSI C
pointer aliasing rule (gcc optimization/7427). from tsubai, sync w/kame
 1.8 21-Dec-2001  itojun branches: 1.8.8;
whitespace. sync with kame
 1.7 13-Nov-2001  lukem add RCSIDs
 1.6 19-May-2001  thorpej branches: 1.6.2;
Brain'o in last. Pointed out by Steve Woodford <scw@netbsd.org>.
 1.5 19-May-2001  thorpej Don't compute psuedo header checksum if nxt == 0.
 1.4 30-Mar-2000  augustss branches: 1.4.6; 1.4.8;
Remove register declarations.
 1.3 15-Feb-2000  itojun make assumption on mbuf explicit (m->m_len >= sizeof (struct ip)).
 1.2 13-Dec-1999  itojun branches: 1.2.2;
sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.1 30-Nov-1999  itojun branches: 1.1.2;
file in4_cksum.c was initially added on branch kame.
 1.1.2.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.2.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.8.4 01-Aug-2002  nathanw Catch up to -current.
 1.4.8.3 08-Jan-2002  nathanw Catch up to -current.
 1.4.8.2 14-Nov-2001  nathanw Catch up to -current.
 1.4.8.1 21-Jun-2001  nathanw Catch up to -current.
 1.4.6.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.6.1 30-Mar-2000  bouyer file in4_cksum.c was added on branch thorpej_scsipi on 2000-11-20 18:10:22 +0000
 1.6.2.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.6.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.8.8.1 29-Aug-2002  gehenna catch up with -current.
 1.9.6.4 04-Feb-2005  skrll Sync with HEAD.
 1.9.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.9.6.1 03-Aug-2004  skrll Sync with HEAD
 1.10.10.1 12-Feb-2005  yamt sync with head.
 1.10.8.1 29-Apr-2005  kent sync with -current
 1.11.6.4 27-Feb-2008  yamt sync with head.
 1.11.6.3 11-Feb-2008  yamt sync with head.
 1.11.6.2 04-Feb-2008  yamt sync with head.
 1.11.6.1 03-Sep-2007  yamt sync with head.
 1.12.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.14.22.1 18-Feb-2008  mjf Sync with HEAD.
 1.14.16.1 23-Mar-2008  matt sync with HEAD
 1.17.38.1 06-Jun-2011  jruoho Sync with HEAD.
 1.17.32.1 31-May-2011  rmind sync with head
 1.18.14.2 03-Dec-2017  jdolecek update from HEAD
 1.18.14.1 23-Jun-2013  tls resync from head
 1.18.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.19.14.1 06-Apr-2015  skrll Sync with HEAD
 1.19.12.1 01-Dec-2014  martin Pull up following revision(s) (requested by christos in ticket #281):
sys/netinet/in4_cksum.c: revision 1.20
Only check that the offset < sizeof(struct ip) if nxt != 0, i.e. in the
tcp and udp cases. From kre.
 1.22 25-Jan-2008  joerg Refactor in_cksum/in4_cksum/in6_cksum implementations:
- All three functions are included in the kernel by default.
They call a backend function cpu_in_cksum after possibly
computing the checksum of the pseudo header.
- cpu_in_cksum is the core to implement the one-complement sum.
The default implementation is moderate fast on most platforms
and provides a 32bit accumulator with 16bit addends for L32 platforms
and a 64bit accumulator with 32bit addends for L64 platforms.
It handles edge cases like very large mbuf chains (could happen with
native IPv6 in the future) and provides a good base for new native
implementations.
- Modify i386 and amd64 assembly to use the new interface.

This disables the MD implementations on !x86 until the conversion is
done. For Alpha, the portable version is faster.
 1.21 09-Jan-2008  joerg When not compiling for the kernel, use stdio.h instead of sys/systm.h
(printf) and locally define the protoype. Makes it possible to use
in_cksum.c for regression testing.
 1.20 09-Jan-2008  joerg Anyone seriously interested in implementing in_cksum on a new platform
should read RFC 1071, so point them to it.
 1.19 11-Dec-2005  christos branches: 1.19.46; 1.19.52; 1.19.60;
merge ktrace-lwp.
 1.18 03-Feb-2005  perry branches: 1.18.6;
ANSIfy function prototypes. (Still have about 3/5ths of the C files in
netinet to go...)
 1.17 07-Aug-2003  agc branches: 1.17.8; 1.17.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.16 09-Jun-2002  itojun branches: 1.16.6;
whitespace
 1.15 13-Nov-2001  lukem branches: 1.15.8;
add RCSIDs
 1.14 30-Mar-2000  augustss branches: 1.14.6; 1.14.8;
Remove register declarations.
 1.13 13-Oct-1996  christos branches: 1.13.28;
backout previous kprintf changes
 1.12 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.11 08-Apr-1996  jonathan fixes for -Wall -Wmissing-prototypes:
include <sys/systm.h> to get a prototyped declaration of printf().
include <netinet/in.h> to get a prototyped declaration of in_cksum().
 1.10 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.9 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.8 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.7 10-Jan-1994  mycroft Fix function name.
 1.6 09-Jan-1994  mycroft Prototype the rest.
 1.5 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.4 18-Dec-1993  mycroft Canonicalize all #includes.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 17-Apr-1993  glass this file is never compiled, nor included in 'files' because it is adapted
for the particular architecture. However, it never would've compiled either
as it had the old '../h/foo.h' stuff in it.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.13.28.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.8.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.14.8.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.14.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.14.6.1 14-Nov-2001  nathanw Catch up to -current.
 1.15.8.1 20-Jun-2002  gehenna catch up with -current.
 1.16.6.4 04-Feb-2005  skrll Sync with HEAD.
 1.16.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.16.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.16.6.1 03-Aug-2004  skrll Sync with HEAD
 1.17.10.1 12-Feb-2005  yamt sync with head.
 1.17.8.1 29-Apr-2005  kent sync with -current
 1.18.6.2 04-Feb-2008  yamt sync with head.
 1.18.6.1 21-Jan-2008  yamt sync with head
 1.19.60.1 10-Jan-2008  bouyer Sync with HEAD
 1.19.52.1 18-Feb-2008  mjf Sync with HEAD.
 1.19.46.1 23-Mar-2008  matt sync with HEAD
 1.96 07-Dec-2022  knakahara gif(4), ipsec(4) and l2tp(4) use encap_attach_addr().
 1.95 19-Sep-2019  knakahara Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.94 01-May-2018  maxv branches: 1.94.2; 1.94.6;
Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.93 27-Apr-2018  knakahara Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.92 10-Jan-2018  knakahara branches: 1.92.2;
apply in{,6}_tunnel_validate() to gif(4).
 1.91 27-Nov-2017  knakahara IFF_RUNNING checking in Rx and Tx processing is unnecessary now.

Because the configs of gif (members of gif_var) are protected by psref(9).
 1.90 27-Nov-2017  knakahara preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).

After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).

update locking notes later.
 1.89 15-Nov-2017  knakahara Add argument to encapsw->pr_input() instead of m_tag.
 1.88 21-Sep-2017  knakahara add lock for percpu route like l2tp(4).
 1.87 06-Jan-2017  knakahara branches: 1.87.8;
remove unnecessary conversion.

gif_softc->gif_pdst is already valid sockaddr.
 1.86 14-Dec-2016  knakahara fix race of gif_softc->gif_ro when we send multiple flows over gif on NET_MPSAFE enabled kernel.

make gif_softc->gif_ro percpu as well as ipforward_rt to resolve this race.
and add future TODO comment for etherip(4).
 1.85 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.84 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.83 18-Aug-2016  knakahara remove unnecessary dependency on gif.h to become friendly with module and rump.

When in_gif.c become compile target, NGIF is always more than 1.
 1.82 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.81 06-Jul-2016  ozaki-r branches: 1.81.2;
Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.
 1.80 06-Jul-2016  knakahara fix build failure when defined GIF_ENCAPCHECK
 1.79 06-Jul-2016  ozaki-r Apply m_get_rcvif_psref (kill m_get_rcvif_NOMPSAFE)
 1.78 04-Jul-2016  knakahara fix: gif(4) receive side race

A panic cause in rn_match() called by encap[46]_lookup(). The reason is that
gif(4) does not suspend receive packet processing in spite of suspending
transmit packet processing while anyone is doing gif(4) ioctl.
 1.77 04-Jul-2016  knakahara let gif(4) promise softint(9) contract (1/2) : gif(4) side

To prevent calling softint_schedule() after called softint_disestablish(),
the following modifications are added
+ ioctl (writing configuration) side
- off IFF_RUNNING flag before changing configuration
- wait softint handler completion before changing configuration
+ packet processing (reading configuraiotn) side
- if IFF_RUNNING flag is on, do nothing
+ in whole
- add gif_list_lock_{enter,exit} to prevent the same configuration is
set to other gif(4) interfaces
 1.76 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.75 26-Jan-2016  knakahara eliminate variable argument in encapsw
 1.74 26-Jan-2016  knakahara implement encapsw instead of protosw and uniform prototype.

suggested and advised by riastradh@n.o, thanks.

BTW, It seems in_stf_input() had bugs...
 1.73 23-Jan-2016  riastradh Those were local changes not meant to be part of the revert. SORRY!
 1.72 23-Jan-2016  christos fix compilation
 1.71 22-Jan-2016  riastradh Back out previous change to introduce struct encapsw.

This change was intended, but Nakahara-san had already made a better
one locally! So I'll let him commit that one, and I'll try not to
step on anyone's toes again.
 1.70 22-Jan-2016  riastradh Don't abuse struct protosw for ip_encap -- introduce struct encapsw.

Mostly mechanical change to replace it, culling some now-needless
boilerplate around all the users.

This does not substantively change the ip_encap API or eliminate
abuse of sketchy pointer casts -- that will come later, and will be
easier now that it is not tangled up with struct protosw.
 1.69 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.68 18-Jan-2016  knakahara Refactor protosw codes in gif(4). No functional change.

- remove unnecessary include
- reduce scopes
 1.67 25-Dec-2015  knakahara use satosin{,6} macros instead of casts.
 1.66 11-Dec-2015  knakahara PR kern/50522: gif(4) ioctl causes panic while someone is using the gif(4) interface.

It is required to wait other CPU's softint completion before disestablishing
the softint handler.
 1.65 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.64 18-May-2014  rmind branches: 1.64.4;
Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.63 01-Mar-2013  joerg branches: 1.63.6; 1.63.10;
Retire OSI network stack. OK core@
 1.62 09-Jan-2012  liamjfoy branches: 1.62.6;
check against NULL
 1.61 17-Jul-2011  joerg branches: 1.61.2; 1.61.6;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.60 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.59 12-Apr-2008  thorpej branches: 1.59.4; 1.59.10; 1.59.12;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.58 07-Apr-2008  thorpej Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.
 1.57 20-Dec-2007  dyoung branches: 1.57.6;
Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.56 02-May-2007  dyoung branches: 1.56.8; 1.56.16; 1.56.20;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.55 04-Mar-2007  christos branches: 1.55.2; 1.55.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.54 17-Feb-2007  dyoung branches: 1.54.2;
bzero -> memset
 1.53 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.52 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.51 23-Nov-2006  rpaulo New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.50 28-Jul-2006  dyoung branches: 1.50.4; 1.50.6;
Extract predicate M_UNWRITABLE(m, len), which is true iff len
consecutive bytes at the front of m are writable (i.e., neither
shared nor read-only).
 1.49 28-Jul-2006  dyoung Fix mtod() usage. If we will write to the mbuf data, check whether
the data is read-only/shared and call m_pullup(). Otherwise,
extract a const pointer to the mbuf data.

XXX I should extract a new macro, M_WRITABLE(m, len), that is true
if m has len consecutive writable bytes at its front.
 1.48 28-Jul-2006  dyoung Where mbuf data may be read-only/shared, use mtod(m, const ...).

Annotate a comparison and m_pullup() that seem unnecessary.
 1.47 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.46 11-Dec-2005  christos branches: 1.46.4; 1.46.6; 1.46.8; 1.46.14;
merge ktrace-lwp.
 1.45 26-Jun-2005  mlelstv branches: 1.45.2;
expire cached route. Fixes PR 22792.
 1.44 02-Jun-2005  tron Change the first argument of the encapsulation check function from
"const struct mbuf *" to "struct mbuf *". Without this change the
actual implementation cannot even use m_copydata() on the mbuf chain
which is broken.
 1.43 02-Jun-2005  tron Remove type casts and lint directives which are now longer necessary
because the first argument of m_copydata() is "const struct mbuf *" now.
 1.42 29-May-2005  christos - add const
- remove bogus casts
- avoid nested variables
 1.41 26-Feb-2005  perry branches: 1.41.2;
nuke trailing whitespace
 1.40 03-Feb-2005  perry ANSIfy function prototypes. (Still have about 3/5ths of the C files in
netinet to go...)
 1.39 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.38 01-Feb-2005  he Fix "unused local variable" warning/error if compiling without
bridge support by making variable declaration conditional. Found
while compiling for shark.
 1.37 31-Jan-2005  kim Add RFC 3378 EtherIP support, ported from OpenBSD to NetBSD by
Hans Rosenfeld (rosenfeld at grumpf.hope-2000.org)

This change makes it possible to add gif interfaces to bridges, which
will then send and receive IP protocol 97 packets. Packets are Ethernet
frames with an EtherIP header prepended.
 1.36 26-Apr-2004  matt branches: 1.36.4; 1.36.6;
Remove #else clause of __STDC__
 1.35 22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.34 11-Nov-2003  jonathan branches: 1.34.4;
Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.
 1.33 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.32 11-Nov-2002  itojun branches: 1.32.6;
make USE_ENCAPCHECK (in netinet*/*gif.c) to global option, GIF_ENCAPCHECK.
#ifdef out unneeded code when possible.
From: Krister Walfridsson <cato@df.lth.se>
 1.31 05-Nov-2002  itojun improve gif lookup performance, when there are many of those,
by using radix tree for lookups. tested by yshimizu@iij.
 1.30 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.29 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.28 14-Jul-2002  itojun need to bzero() before rtalloc. KAME PR 432
 1.27 09-Jun-2002  itojun whitespace
 1.26 21-Dec-2001  itojun branches: 1.26.8; 1.26.10;
move protosw fragment for gif/stf to their own source code.
reduce #ifdef in stf code. sync with kame
 1.25 13-Nov-2001  lukem add RCSIDs
 1.24 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.23 16-Aug-2001  itojun branches: 1.23.4;
gif interface now uses generic software interrupt
(on archs that support it). also, make gif ALTQ-capable on outgoing.
sync with kame, comments from thorpej.
 1.22 29-Jul-2001  itojun sync gif interface code with latest kame.
IFF_RUNNING is clearified. attach/detach logic is more clearner.
the old code mistakenly set IFF_UP by itself, now the behavior is gone.
 1.21 14-May-2001  itojun branches: 1.21.2;
drop multi destination mode (IFF_LINK0).
 1.20 10-May-2001  itojun correct ecn consideration on tunnel encap/decap. sync with kame.
 1.19 20-Feb-2001  itojun branches: 1.19.2;
add AF_ISO case to output. from chopps.
 1.18 20-Feb-2001  itojun ISO over IPv4/v6 by EON encapsulation. from chopps, sync with kame.
 1.17 22-Jan-2001  itojun revert revision 1.15 (on ingress, DF bit copied from inner to outer).

since we do not have feedback mechanism from path MTU to tunnel MTU
(not sure if we should), and inner packet source will not get informed of
outer PMTUD (we shouldn't do this), 1.15 behavior can lead us to
blackhole behavior.

configurable behavior (as suggested in RFC2401 6.1) would be nice to have,
however, reusing net.inet.ipsec.dfbit would be hairy.
 1.16 22-Jan-2001  itojun make it possible to turn off ingress filter on gif/stf tunnel egress,
by using IFF_LINK2. (part of) PR 11163 from Ken Raeburn.
 1.15 05-Jul-2000  thorpej RFCs 1853, 2003, 2401 -- copy the DF bit.
 1.14 26-Apr-2000  itojun branches: 1.14.4;
sync with more recent kame. defer inclusion of net/if_gif.h.
 1.13 20-Apr-2000  enami IN_MULTICAST() takes in_addr.s_addr as argument, not pointer to it.
 1.12 19-Apr-2000  itojun introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.11 30-Mar-2000  augustss Remove register declarations.
 1.10 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.9 07-Feb-2000  itojun s/DIAGNOSTIC/DEBUG/
 1.8 06-Jan-2000  itojun remove too much portability code in KAME, to improve readability.
 1.7 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.6 20-Aug-1999  itojun branches: 1.6.2; 1.6.8;
do not capture packets by gif, when gif interface is down.
 1.5 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.4 06-Jul-1999  itojun sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in_gif.c was initially added on branch kame.
 1.1.2.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in_gif.c was added on branch chs-ubc2 on 1999-07-01 23:47:00 +0000
 1.6.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.6.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.6.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.4.1 01-May-2001  he Pull up revision 1.16 (via patch, requested by itojun):
Make it possible to turn off ingress filter on gif/stf tunnel
egress by using IFF_LINK2. Fixes (part of) PR#11163.
 1.19.2.10 11-Dec-2002  thorpej Sync with HEAD.
 1.19.2.9 11-Nov-2002  nathanw Catch up to -current
 1.19.2.8 17-Sep-2002  nathanw Catch up to -current.
 1.19.2.7 27-Aug-2002  nathanw Catch up to -current.
 1.19.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.19.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.19.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.19.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.19.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.19.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.21.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.21.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.21.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.21.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.21.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.21.2.1 03-Aug-2001  lukem update to -current
 1.23.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.26.10.1 15-Jul-2002  thorpej pullup-1-6 ticket #506 (itojun).

Original log message:
need to bzero() before rtalloc. KAME PR 432.
 1.26.8.3 29-Aug-2002  gehenna catch up with -current.
 1.26.8.2 15-Jul-2002  gehenna catch up with -current.
 1.26.8.1 20-Jun-2002  gehenna catch up with -current.
 1.32.6.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.32.6.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.32.6.4 04-Feb-2005  skrll Sync with HEAD.
 1.32.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.32.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.32.6.1 03-Aug-2004  skrll Sync with HEAD
 1.34.4.1 09-Jan-2006  tron Pull up following revision(s) (requested by mlelstv in ticket #10214):
sys/netinet6/in6_gif.c: revision 1.43
sys/netinet/in_gif.c: revision 1.45
sys/net/if_gif.h: revision 1.11
expire cached route. Fixes PR 22792.
 1.36.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.36.6.1 12-Feb-2005  yamt sync with head.
 1.36.4.1 29-Apr-2005  kent sync with -current
 1.41.2.1 08-Jan-2006  riz Pull up following revision(s) (requested by mlelstv in ticket #1092):
sys/netinet6/in6_gif.c: revision 1.43
sys/netinet/in_gif.c: revision 1.45
sys/net/if_gif.h: revision 1.11
expire cached route. Fixes PR 22792.
 1.45.2.5 21-Jan-2008  yamt sync with head
 1.45.2.4 03-Sep-2007  yamt sync with head.
 1.45.2.3 26-Feb-2007  yamt sync with head.
 1.45.2.2 30-Dec-2006  yamt sync with head.
 1.45.2.1 21-Jun-2006  yamt sync with head.
 1.46.14.1 19-Jun-2006  chap Sync with head.
 1.46.8.2 11-Aug-2006  yamt sync with head
 1.46.8.1 26-Jun-2006  yamt sync with head.
 1.46.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.46.4.1 09-Sep-2006  rpaulo sync with head
 1.50.6.2 18-Dec-2006  yamt sync with head.
 1.50.6.1 10-Dec-2006  yamt sync with head.
 1.50.4.1 12-Jan-2007  ad Sync with head.
 1.54.2.3 07-May-2007  yamt sync with head.
 1.54.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.54.2.1 17-Feb-2007  rmind file in_gif.c was added on branch yamt-idlelwp on 2007-03-12 05:59:36 +0000
 1.55.4.1 11-Jul-2007  mjf Sync with head.
 1.55.2.1 08-Jun-2007  ad Sync with head.
 1.56.20.1 02-Jan-2008  bouyer Sync with HEAD
 1.56.16.1 26-Dec-2007  ad Sync with head.
 1.56.8.1 09-Jan-2008  matt sync with HEAD
 1.57.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.57.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.59.12.1 19-Jan-2009  skrll Sync with HEAD.
 1.59.10.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.59.4.1 04-May-2009  yamt sync with head.
 1.61.6.1 18-Feb-2012  mrg merge to -current.
 1.61.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.61.2.1 17-Apr-2012  yamt sync with head
 1.62.6.3 03-Dec-2017  jdolecek update from HEAD
 1.62.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.62.6.1 23-Jun-2013  tls resync from head
 1.63.10.1 10-Aug-2014  tls Rebase.
 1.63.6.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.64.4.6 05-Feb-2017  skrll Sync with HEAD
 1.64.4.5 05-Oct-2016  skrll Sync with HEAD
 1.64.4.4 09-Jul-2016  skrll Sync with HEAD
 1.64.4.3 19-Mar-2016  skrll Sync with HEAD
 1.64.4.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.64.4.1 22-Sep-2015  skrll Sync with HEAD
 1.81.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.81.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.87.8.6 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.87.8.5 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #829):

sys/net/if_l2tp.c: revision 1.24
sys/net/if_ipsec.c: revision 1.13
sys/net/if_gif.h: revision 1.31
sys/netipsec/ipsecif.c: revision 1.8
sys/net/if_gif.c: revision 1.140
sys/netinet6/in6_l2tp.c: revision 1.15
sys/net/if_ipsec.h: revision 1.3
sys/netinet6/in6_gif.c: revision 1.92
sys/net/if_l2tp.h: revision 1.5
sys/netinet/in_l2tp.c: revision 1.13
sys/netinet/in_gif.c: revision 1.93

Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.87.8.4 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.87.8.3 02-Jan-2018  snj Pull up following revision(s) (requested by knakahara in ticket #462):
sys/net/if_gif.c: revision 1.133, 1.134, 1.137
sys/net/if_gif.h: revision 1.28-1.29
sys/netinet/in_gif.c: revision 1.90-1.91
sys/netinet/in_gif.h: revision 1.18
sys/netinet6/in6_gif.c: revision 1.88-1.89
sys/netinet6/in6_gif.h: revision 1.17
preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).
After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).
update locking notes later.
--
update gif(4) locking notes.
--
IFF_RUNNING checking in Rx and Tx processing is unnecessary now.
Because the configs of gif (members of gif_var) are protected by psref(9).
--
remove duplicated null ckeck
 1.87.8.2 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.87.8.1 24-Oct-2017  snj Pull up following revision(s) (requested by knahakara in ticket #303):
sys/net/if_gif.c: 1.129-1.130
sys/net/if_gif.h: 1.26-1.27
sys/netinet/in_gif.c: 1.88
sys/netinet6/in6_gif.c: 1.86
add lock for percpu route like l2tp(4).
--
add lock for sclist to exclude ifconfig gifX add/delete and ifconfig gifX tunnel
--
update locking notes.
 1.92.2.1 02-May-2018  pgoyette Synch with HEAD
 1.94.6.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.94.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.18 27-Nov-2017  knakahara preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).

After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).

update locking notes later.
 1.17 15-Nov-2017  knakahara Add argument to encapsw->pr_input() instead of m_tag.
 1.16 04-Jul-2016  knakahara branches: 1.16.10;
fix: gif(4) receive side race

A panic cause in rn_match() called by encap[46]_lookup(). The reason is that
gif(4) does not suspend receive packet processing in spite of suspending
transmit packet processing while anyone is doing gif(4) ioctl.
 1.15 26-Jan-2016  knakahara eliminate variable argument in encapsw
 1.14 23-Nov-2006  rpaulo branches: 1.14.98; 1.14.118;
New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.13 10-Dec-2005  elad branches: 1.13.20; 1.13.22;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.12 06-Jun-2005  martin branches: 1.12.2;
Since we decided "const struct mbuf *" would not do the right thing (tm),
remove ~all const from mbuf pointers.
 1.11 31-Jan-2005  kim Add RFC 3378 EtherIP support, ported from OpenBSD to NetBSD by
Hans Rosenfeld (rosenfeld at grumpf.hope-2000.org)

This change makes it possible to add gif interfaces to bridges, which
will then send and receive IP protocol 97 packets. Packets are Ethernet
frames with an EtherIP header prepended.
 1.10 21-Apr-2004  itojun branches: 1.10.4; 1.10.6;
no space between function name and paren: foo (blah) -> foo(blah)
 1.9 18-Apr-2004  matt De __P()
 1.8 11-Nov-2002  itojun branches: 1.8.6;
make USE_ENCAPCHECK (in netinet*/*gif.c) to global option, GIF_ENCAPCHECK.
#ifdef out unneeded code when possible.
From: Krister Walfridsson <cato@df.lth.se>
 1.7 16-Aug-2001  itojun gif interface now uses generic software interrupt
(on archs that support it). also, make gif ALTQ-capable on outgoing.
sync with kame, comments from thorpej.
 1.6 29-Jul-2001  itojun sync gif interface code with latest kame.
IFF_RUNNING is clearified. attach/detach logic is more clearner.
the old code mistakenly set IFF_UP by itself, now the behavior is gone.
 1.5 19-Apr-2000  itojun branches: 1.5.6; 1.5.8;
introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.4 06-Jul-1999  itojun branches: 1.4.2;
sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file in_gif.h was initially added on branch kame.
 1.1.2.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file in_gif.h was added on branch chs-ubc2 on 1999-07-01 23:47:00 +0000
 1.4.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.8.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.5.8.1 03-Aug-2001  lukem update to -current
 1.5.6.2 11-Dec-2002  thorpej Sync with HEAD.
 1.5.6.1 24-Aug-2001  nathanw Catch up with -current.
 1.8.6.6 11-Dec-2005  christos Sync with head.
 1.8.6.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.8.6.4 04-Feb-2005  skrll Sync with HEAD.
 1.8.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.8.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.8.6.1 03-Aug-2004  skrll Sync with HEAD
 1.10.6.1 12-Feb-2005  yamt sync with head.
 1.10.4.1 29-Apr-2005  kent sync with -current
 1.12.2.2 30-Dec-2006  yamt sync with head.
 1.12.2.1 21-Jun-2006  yamt sync with head.
 1.13.22.1 10-Dec-2006  yamt sync with head.
 1.13.20.1 12-Jan-2007  ad Sync with head.
 1.14.118.2 09-Jul-2016  skrll Sync with HEAD
 1.14.118.1 19-Mar-2016  skrll Sync with HEAD
 1.14.98.1 03-Dec-2017  jdolecek update from HEAD
 1.16.10.2 02-Jan-2018  snj Pull up following revision(s) (requested by knakahara in ticket #462):
sys/net/if_gif.c: revision 1.133, 1.134, 1.137
sys/net/if_gif.h: revision 1.28-1.29
sys/netinet/in_gif.c: revision 1.90-1.91
sys/netinet/in_gif.h: revision 1.18
sys/netinet6/in6_gif.c: revision 1.88-1.89
sys/netinet6/in6_gif.h: revision 1.17
preserve gif(4) configs by psref(9) like vlan(4) and l2tp(4).
After Tx side does not use softint, gif(4) can use psref(9) for config
preservation like vlan(4) and l2tp(4).
update locking notes later.
--
update gif(4) locking notes.
--
IFF_RUNNING checking in Rx and Tx processing is unnecessary now.
Because the configs of gif (members of gif_var) are protected by psref(9).
--
remove duplicated null ckeck
 1.16.10.1 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.1 13-Nov-2006  dyoung branches: 1.1.2; 1.1.6; 1.1.8;
Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.1.8.2 30-Dec-2006  yamt sync with head.
 1.1.8.1 13-Nov-2006  yamt file in_ifattach.h was added on branch yamt-lazymbuf on 2006-12-30 20:50:33 +0000
 1.1.6.2 10-Dec-2006  yamt sync with head.
 1.1.6.1 13-Nov-2006  yamt file in_ifattach.h was added on branch yamt-splraiseipl on 2006-12-10 07:19:10 +0000
 1.1.2.2 18-Nov-2006  ad Sync with head.
 1.1.2.1 13-Nov-2006  ad file in_ifattach.h was added on branch newlock2 on 2006-11-18 21:39:36 +0000
 1.22 01-Sep-2023  andvar fix typos in comments, mainly s/innner/inner/.
 1.21 07-Dec-2022  knakahara gif(4), ipsec(4) and l2tp(4) use encap_attach_addr().
 1.20 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.19 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.18 29-Jan-2020  thorpej branches: 1.18.6;
Adopt <net/if_stats.h>.
 1.17 19-Sep-2019  knakahara branches: 1.17.2;
Avoid having a rtcache directly in a percpu storage for tunnel protocols.

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.16 03-Sep-2018  knakahara branches: 1.16.4;
fix: l2tp(4) cannot receive packets after reset session without reset tunnel. Pointed out by k-goda@IIJ

When the following operations are done after established session, the l2tp0
cannot receive packets until done deletetunnel && tunnel "src" "dst".
 1.15 21-Jun-2018  knakahara branches: 1.15.2;
sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.
However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.14 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.13 27-Apr-2018  knakahara Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.12 26-Jan-2018  maxv branches: 1.12.2;
Several fixes in L2TP:

* l2tp_input(): use m_copydata, and ensure there is enough space in the
chain. Otherwise overflow.

* l2tp_tcpmss_clamp(): ensure there is enough space in the chain.

* in_l2tp_output(): don't check 'sc' against NULL, it can't be NULL.

* in_l2tp_input(): no need to call m_pullup since we use m_copydata.
Just check the space in the chain.

* in_l2tp_input(): if there is a cookie, make sure the chain has enough
space.

* in6_l2tp_input(): same changes as in_l2tp_input().

Ok knakahara@
 1.11 25-Jan-2018  maxv Style, reduce the indentation level when possible, and add a missing NULL
check after M_PREPEND.
 1.10 22-Jan-2018  maxv Fix null deref, m could be NULL if M_PREPEND fails.
 1.9 18-Dec-2017  knakahara fix mbuf leaks. pointed out and suggested by kre@n.o, thanks.
 1.8 18-Dec-2017  knakahara backout wrong fix again, sorry.
 1.7 15-Dec-2017  knakahara Fix pullup'ed mbuf leaks. The match function just requires enough mbuf length.

XXX need pullup-8
 1.6 15-Dec-2017  knakahara backout wrong fix as it causes atf net/ipsec/t_ipsec_l2tp failures.
 1.5 11-Dec-2017  knakahara fix pullup'ed mbuf leaks. pointed out by maxv@n.o, thanks.

XXX need pullup-8
 1.4 15-Nov-2017  knakahara branches: 1.4.2;
Add argument to encapsw->pr_input() instead of m_tag.
 1.3 11-Jul-2017  knakahara branches: 1.3.4;
l2tp(4): fix mbuf leak when tunnel nested over the limit

XXX need pullup -8 branch
 1.2 30-Mar-2017  knakahara branches: 1.2.4; 1.2.8;
remove duplicated validation. That is already done in l2tp_lookup_session_ref().

pointed out by s-yamaguchi@IIJ, thanks.
 1.1 16-Feb-2017  knakahara branches: 1.1.2;
add missing files.
 1.1.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.1.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.1 16-Feb-2017  pgoyette file in_l2tp.c was added on branch pgoyette-localcount on 2017-03-20 06:57:50 +0000
 1.2.8.8 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.2.8.7 10-Sep-2018  martin Pull up following revision(s) (requested by knakahara in ticket #1018):

sys/netinet6/in6_l2tp.c: revision 1.17
sys/netinet/in_l2tp.c: revision 1.16

fix: l2tp(4) cannot receive packets after reset session without reset tunnel. Pointed out by k-goda@IIJ

When the following operations are done after established session, the l2tp0
cannot receive packets until done deletetunnel && tunnel "src" "dst".
 1.2.8.6 13-Jul-2018  martin Pull up following revision(s) via patch (requested by knakahara in ticket #905):

sys/netinet/ip_mroute.c: revision 1.160
sys/netinet6/in6_l2tp.c: revision 1.16
sys/net/if.h: revision 1.263
sys/netinet/in_l2tp.c: revision 1.15
sys/netinet/ip_icmp.c: revision 1.172
sys/netinet/igmp.c: revision 1.68
sys/netinet/ip_encap.c: revision 1.69
sys/netinet6/ip6_mroute.c: revision 1.129

sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.

However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.2.8.5 17-May-2018  martin Pull up following revision(s) (requested by knakahara in ticket #829):

sys/net/if_l2tp.c: revision 1.24
sys/net/if_ipsec.c: revision 1.13
sys/net/if_gif.h: revision 1.31
sys/netipsec/ipsecif.c: revision 1.8
sys/net/if_gif.c: revision 1.140
sys/netinet6/in6_l2tp.c: revision 1.15
sys/net/if_ipsec.h: revision 1.3
sys/netinet6/in6_gif.c: revision 1.92
sys/net/if_l2tp.h: revision 1.5
sys/netinet/in_l2tp.c: revision 1.13
sys/netinet/in_gif.c: revision 1.93

Fix LOCKDEBUG kernel panic when many(about 200) tunnel interfaces is created.

The tunnel interfaces are gif(4), l2tp(4), and ipsecif(4). They use mutex
itself in percpu area. When percpu_cpu_enlarge() run, the address of the
mutex in percpu area becomes different from the address which lockdebug
saved. That can cause "already initialized" false detection.
 1.2.8.4 08-Mar-2018  martin Pull up following revision(s) (requested by knakahara in ticket #614):
sys/net/if_l2tp.c: revision 1.20
sys/netinet6/in6_l2tp.c: revision 1.13
sys/netinet6/in6_l2tp.c: revision 1.14
sys/net/if_l2tp.h: revision 1.3
sys/net/if_l2tp.c: revision 1.13
sys/netinet/in_l2tp.c: revision 1.10
sys/net/if_l2tp.c: revision 1.18
sys/netinet/in_l2tp.c: revision 1.11
sys/net/if_l2tp.c: revision 1.19
sys/netinet/in_l2tp.c: revision 1.12

If if_attach() failed in the attach function, return. Add comments about if_initialize().
suggested by ozaki-r@n.o.

Fix null deref, m could be NULL if M_PREPEND fails.

style

Style, reduce the indentation level when possible, and add a missing NULL
check after M_PREPEND.

Several fixes in L2TP:
* l2tp_input(): use m_copydata, and ensure there is enough space in the
chain. Otherwise overflow.
* l2tp_tcpmss_clamp(): ensure there is enough space in the chain.
* in_l2tp_output(): don't check 'sc' against NULL, it can't be NULL.
* in_l2tp_input(): no need to call m_pullup since we use m_copydata.
Just check the space in the chain.
* in_l2tp_input(): if there is a cookie, make sure the chain has enough
space.
* in6_l2tp_input(): same changes as in_l2tp_input().
Ok knakahara@

Use MH_ALIGN instead, ok knakahara@.
 1.2.8.3 02-Jan-2018  snj Pull up following revision(s) (requested by knakahara in ticket #461):
sys/netinet/in_l2tp.c: revision 1.9
sys/netinet6/in6_l2tp.c: revision 1.12
fix mbuf leaks. pointed out and suggested by kre@n.o, thanks.
 1.2.8.2 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.2.8.1 12-Jul-2017  martin Pull up following revision(s) (requested by knakahara in ticket #121):
sys/netinet6/in6_l2tp.c: revision 1.6
sys/netinet/in_l2tp.c: revision 1.3
l2tp(4): fix mbuf leak when tunnel nested over the limit
XXX need pullup -8 branch
 1.2.4.2 21-Apr-2017  bouyer Sync with HEAD
 1.2.4.1 30-Mar-2017  bouyer file in_l2tp.c was added on branch bouyer-socketcan on 2017-04-21 16:54:05 +0000
 1.3.4.2 28-Aug-2017  skrll Sync with HEAD
 1.3.4.1 11-Jul-2017  skrll file in_l2tp.c was added on branch nick-nhusb on 2017-08-28 17:53:12 +0000
 1.4.2.2 03-Dec-2017  jdolecek update from HEAD
 1.4.2.1 15-Nov-2017  jdolecek file in_l2tp.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.12.2.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.12.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.12.2.1 02-May-2018  pgoyette Synch with HEAD
 1.15.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.15.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.15.2.1 10-Jun-2019  christos Sync with HEAD
 1.16.4.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.17.2.1 29-Feb-2020  ad Sync with head.
 1.18.6.1 03-Apr-2021  thorpej Sync with HEAD.
 1.1 16-Feb-2017  knakahara branches: 1.1.2; 1.1.6; 1.1.14; 1.1.18;
add missing files.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 16-Feb-2017  jdolecek file in_l2tp.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.14.2 28-Aug-2017  skrll Sync with HEAD
 1.1.14.1 16-Feb-2017  skrll file in_l2tp.h was added on branch nick-nhusb on 2017-08-28 17:53:12 +0000
 1.1.6.2 21-Apr-2017  bouyer Sync with HEAD
 1.1.6.1 16-Feb-2017  bouyer file in_l2tp.h was added on branch bouyer-socketcan on 2017-04-21 16:54:05 +0000
 1.1.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.2.1 16-Feb-2017  pgoyette file in_l2tp.h was added on branch pgoyette-localcount on 2017-03-20 06:57:50 +0000
 1.15 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.14 27-Mar-2020  jdolecek branches: 1.14.28;
fix in4_cksum() panic "in4_cksum: mbuf 14 too short for IP header 20"
triggered by bridge_output() when passing packet originally for
interface supporting hw csum offload to destination interface
not supporting it

problem happens because bridge_output() is called after ether_output()
M_PREPEND() the ether_header into the mbuf chain, if there is not
enough space on the first mbuf of the chain, it ends up prepending
a new short mbuf with just ether_header

triggered by running UDP (IPv4) 'netio -u' benchmark with packet size 2 KB

XXX seems in6_undefer_cksum() should have similar fix, however I was
XXX not able to trigger the problem there
 1.13 12-Dec-2018  rin PR kern/53562

Add ether_sw_offload_[tr]x: handle TX/RX offload options in software.
Since this violates separation b/w L2 and L3/L4, new files are added
rather than having the routines in sys/net/if_ethersubr.c.

OK msaitoh thorpej
 1.12 19-Sep-2018  rin Fix in_undefer_cksum() and in6_undefer_cksum().

The 4th argument for in[46]_cksum() should be length of L4 header +
L4 payload. The previous revisions are wrong

- for IPv4 when hdrlen != 0
- for IPv6 always

These functions are used only in net/if_loop.c and
arch/powerpc/booke/dev/pq3etsec.c under some special circumferences.
This should be why the bugs have not been found until today.

OK maxv
 1.11 11-Jul-2018  maxv Add KASSERTs in in_undefer_cksum_tcpudp.
 1.10 11-Jul-2018  maxv Style, rename 'iph' -> 'ip', and reduce the diff between
in_undefer_cksum_tcpudp and the last part of in_undefer_cksum.
 1.9 11-Jul-2018  maxv Remove the callback, localify, and add a comment.
 1.8 11-Jul-2018  maxv Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.
 1.7 26-Apr-2016  ozaki-r branches: 1.7.16; 1.7.18;
Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.
 1.6 04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.5 25-Apr-2011  yamt branches: 1.5.14; 1.5.32;
ip_undefer_csum:
- don't forget ntohs.
- don't add hdrlen twice for l4 header offset.
- use M_CSUM_DATA_IPv4_IPHL instead of extracting it from ip header.
- simplify code.
- KNF.
 1.4 14-Apr-2011  yamt after ip_input.c rev.1.285 and 1.286, restore kernel_lock for if_output.
 1.3 11-Dec-2010  matt branches: 1.3.2;
Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.
 1.2 24-Apr-2007  dyoung branches: 1.2.56; 1.2.60;
Constify.
 1.1 25-Nov-2006  yamt branches: 1.1.4; 1.1.6; 1.1.8; 1.1.10; 1.1.14; 1.1.16;
move tso-by-software code to their own files. no functional changes.
 1.1.16.1 11-Jul-2007  mjf Sync with head.
 1.1.14.1 08-Jun-2007  ad Sync with head.
 1.1.10.1 07-May-2007  yamt sync with head.
 1.1.8.2 12-Jan-2007  ad Sync with head.
 1.1.8.1 25-Nov-2006  ad file in_offload.c was added on branch newlock2 on 2007-01-12 01:04:14 +0000
 1.1.6.3 03-Sep-2007  yamt sync with head.
 1.1.6.2 30-Dec-2006  yamt sync with head.
 1.1.6.1 25-Nov-2006  yamt file in_offload.c was added on branch yamt-lazymbuf on 2006-12-30 20:50:33 +0000
 1.1.4.2 10-Dec-2006  yamt sync with head.
 1.1.4.1 25-Nov-2006  yamt file in_offload.c was added on branch yamt-splraiseipl on 2006-12-10 07:19:10 +0000
 1.2.60.1 07-Jan-2011  matt If using hardware checksum offload and the packet can't be h/w checksumed
(for whatever reason, some hardware is stupid) allow the driver to calculate
the checksum instead.
 1.2.56.3 31-May-2011  rmind sync with head
 1.2.56.2 21-Apr-2011  rmind sync with head
 1.2.56.1 05-Mar-2011  rmind sync with head
 1.3.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.5.32.2 29-May-2016  skrll Sync with HEAD
 1.5.32.1 06-Jun-2015  skrll Sync with HEAD
 1.5.14.1 03-Dec-2017  jdolecek update from HEAD
 1.7.18.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.7.18.1 10-Jun-2019  christos Sync with HEAD
 1.7.16.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.7.16.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.7.16.1 28-Jul-2018  pgoyette Sync with HEAD
 1.14.28.1 02-Aug-2025  perseant Sync with HEAD
 1.12 12-Dec-2018  rin PR kern/53562

Add ether_sw_offload_[tr]x: handle TX/RX offload options in software.
Since this violates separation b/w L2 and L3/L4, new files are added
rather than having the routines in sys/net/if_ethersubr.c.

OK msaitoh thorpej
 1.11 11-Jul-2018  maxv Style, rename 'iph' -> 'ip', and reduce the diff between
in_undefer_cksum_tcpudp and the last part of in_undefer_cksum.
 1.10 11-Jul-2018  maxv Remove the callback, localify, and add a comment.
 1.9 11-Jul-2018  maxv Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.
 1.8 25-Apr-2011  yamt branches: 1.8.54; 1.8.56;
undefer csum in looutput.
looutput is used by various code (ether_output, mcast) to loopback packets.
 1.7 11-Dec-2010  matt branches: 1.7.2;
Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.
 1.6 28-Nov-2007  dyoung branches: 1.6.40; 1.6.44;
Move IN_NEED_CHECKSUM() to in_offload.h for re-use.
 1.5 24-Apr-2007  dyoung branches: 1.5.6; 1.5.8; 1.5.14;
Constify.
 1.4 25-Nov-2006  yamt branches: 1.4.4; 1.4.8; 1.4.10;
move tso-by-software code to their own files. no functional changes.
 1.3 10-Dec-2005  elad branches: 1.3.20; 1.3.22;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.2 10-Aug-2005  yamt branches: 1.2.6;
move {tcp,udp}_do_loopback_cksum back to tcp/udp
so that they can be referenced by ipv6.
 1.1 18-Apr-2005  yamt branches: 1.1.2; 1.1.4;
add a function to handle M_CSUM_TSOv4 by software.
 1.1.4.4 07-Dec-2007  yamt sync with head
 1.1.4.3 03-Sep-2007  yamt sync with head.
 1.1.4.2 30-Dec-2006  yamt sync with head.
 1.1.4.1 21-Jun-2006  yamt sync with head.
 1.1.2.2 29-Apr-2005  kent sync with -current
 1.1.2.1 18-Apr-2005  kent file in_offload.h was added on branch kent-audio2 on 2005-04-29 11:29:33 +0000
 1.2.6.3 11-Dec-2005  christos Sync with head.
 1.2.6.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.2.6.1 10-Aug-2005  skrll file in_offload.h was added on branch ktrace-lwp on 2005-11-10 14:11:07 +0000
 1.3.22.1 10-Dec-2006  yamt sync with head.
 1.3.20.1 12-Jan-2007  ad Sync with head.
 1.4.10.1 11-Jul-2007  mjf Sync with head.
 1.4.8.1 08-Jun-2007  ad Sync with head.
 1.4.4.1 07-May-2007  yamt sync with head.
 1.5.14.1 08-Dec-2007  mjf Sync with HEAD.
 1.5.8.1 09-Jan-2008  matt sync with HEAD
 1.5.6.1 03-Dec-2007  joerg Sync with HEAD.
 1.6.44.1 07-Jan-2011  matt If using hardware checksum offload and the packet can't be h/w checksumed
(for whatever reason, some hardware is stupid) allow the driver to calculate
the checksum instead.
 1.6.40.2 31-May-2011  rmind sync with head
 1.6.40.1 05-Mar-2011  rmind sync with head
 1.7.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.8.56.1 10-Jun-2019  christos Sync with HEAD
 1.8.54.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.8.54.1 28-Jul-2018  pgoyette Sync with HEAD
 1.202 04-Nov-2022  ozaki-r ipcb: add/update the description of functions

From rmind-smpnet patches
 1.201 04-Nov-2022  ozaki-r inpcb: replace leading white spaces with tabs
 1.200 04-Nov-2022  ozaki-r inpcb: get rid of parentheses for return value
 1.199 04-Nov-2022  ozaki-r inpcb: use NULL
 1.198 04-Nov-2022  ozaki-r inpcb: use in_port_t for port numbers
 1.197 04-Nov-2022  ozaki-r inpcb: use pool_cache instead of pool
 1.196 04-Nov-2022  ozaki-r inpcb: rename functions to in6pcb_*
 1.195 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.194 29-Oct-2022  ozaki-r inpcb: fix for kernels without INET6
 1.193 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.192 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.191 14-Oct-2022  ryo Avoid error of "-Wreturn-local-addr", and simplify the logic.

However, -Wreturn-local-addr is still disabled by default by GCC_NO_RETURN_LOCAL_ADDR
in bsd.own.mk because it causes errors in other parts.
 1.190 29-Aug-2022  knakahara Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.
 1.189 29-Jul-2022  knakahara Remove obsoleted comments.

These comments are added with IFNET_LOCK by in_pcb.c:r1.180 and
in6_pcb.c:r1.162. And then, IFNET_LOCK codes are removed in
in_pcb.c:r1.183 and in6_pcb.c:r1.166, however the comments have
remained.
 1.188 10-Jun-2022  knakahara Use LIST_FOREACH macro.
 1.187 09-Jun-2022  knakahara refactor: use TAILQ_FOREACH instead of TAILQ_FOREACH_SAFE about inpt_queue.

They don't use "ninph" pointer and don't remove elements.
 1.186 19-Oct-2021  roy netinet: Allow binding the unspecified address when no addresses exist

You should always be able to bind to the unspecified address even if
no addresses have been configured on any interface.

For example, a DHCP client could be started before the loopback interface
has been fully configured.
 1.185 08-Sep-2020  christos Add IP_BINDANY, IPV6_BINDANY which can be used to bind to any address in
order to implement transparent proxies.
 1.184 20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.183 15-May-2019  ozaki-r Get rid of IFNET_LOCK for if_mcast_op to avoid a deadlock

The IFNET_LOCK was added to avoid data races on if_flags for IFF_ALLMULTI.
Unfortunatetly it caused a deadlock instead. A known scenario causing a
deadlock is to occur the following two operations concurrently: (a) a removal of
an IP adddres assigned to an interface and (b) a manipulation of multicast
groups to the interface. The resource dependency graph is like this:
softnet_lock => IFNET_LOCK => psref_target_destroy => softint => softnet_lock

Thanks to the previous commit that avoids data races on if_flags for
IFF_ALLMULTI by another approach, we can remove IFNET_LOCK and defuse the
deadlock.

PR kern/54189
 1.182 27-Feb-2018  maxv branches: 1.182.4;
Dedup: merge

ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy

The already-existing ipsec_get_policy() function is inlined in the new
one.
 1.181 01-Jan-2018  christos 1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo
 1.180 15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.179 10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.178 25-Apr-2017  ozaki-r branches: 1.178.4;
Check if solock of PCB is held when SP caches in the PCB are accessed

To this end, a back pointer from inpcbpolicy to inpcb_hdr is added.
 1.177 20-Apr-2017  ozaki-r Simplify logic of udp4_sendup and udp6_sendup

They are always passed a socket with the same protocol faimiliy
as its own: AF_INET for udp4_sendup and AF_INET6 for udp6_sendup.
 1.176 02-Mar-2017  ozaki-r Make sure imo_membership is protected by inp's lock (solock)
 1.175 13-Feb-2017  ozaki-r Replace splnet with splsoftnet
 1.174 23-Jan-2017  ozaki-r Get rid of splnet for pool(9)

We don't need it anymore.
 1.173 11-Jan-2017  ozaki-r branches: 1.173.2;
Get rid of unnecessary header inclusions
 1.172 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.171 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.170 29-Sep-2016  roy Now that we disallow sending or receiving from invalid addresses,
allow binding to tentative addresses.
 1.169 26-Aug-2016  roy Allow bind to detached INET addresses.
 1.168 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.167 20-Jul-2016  ozaki-r Reduce scopes of variables
 1.166 08-Jul-2016  ozaki-r branches: 1.166.2;
Replace macros to get an IP address with proper inline functions

The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
 1.165 06-Jul-2016  ozaki-r Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.
 1.164 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.163 15-Feb-2016  rtr Reduce code duplication.

Split creation of IPv4-Mapped IPv6 addresses into its own function
and use it.

No functional change intended. As posted to tech-net@
 1.162 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.161 24-May-2015  rtr remove transitional functions in{,6}_pcbconnect_m() that were used in
converting protocol user requests to accept sockaddr instead of mbufs.

remove tcp_input copy in to mbuf from sockaddr and just copy to sockaddr
to make it possible for the transitional functions to go away.

no version bump since these functions only existed for a short time and
were commented as adapters (they appeared in 7.99.15).
 1.160 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.159 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.158 26-Apr-2015  rtr return EINVAL if sin{,6}_len != sizeof(sockaddr_in{,6}) respectively in
in{,6}_pcbconnect().

checking just m->m_len isn't enough because there are various places that
assume sa_len has been properly populated.
 1.157 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.156 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.155 25-Nov-2014  seanb branches: 1.155.2;
Really make SO_REUSEPORT and SO_REUSEADDR equivalent for multicast
sockets. From FreeBSD.
 1.154 25-Nov-2014  seanb Clean up any dangling ifp references in (struct in6pcb *)->in6p_v4moptions
(v4 multicast options off v4 mapped v6 socket) on interface destruction. The
code to clean this up in a true v4 socket was moved to its own function
which is now also called in the corresponding place for v6 sockets on
interface destruction.
 1.153 10-Nov-2014  maxv Do not uselessly include <sys/malloc.h>.
 1.152 07-Sep-2014  rmind in_pcbdetach: move ip_freemoptions() under softnet_lock for now (this will
be changed back once other IP paths become MP-safe). Same for IPv6 routine.

This partially reverts 1.150 of in_pcb.c and 1.127 of in6_pcb.c changes.
 1.151 05-Aug-2014  rtr branches: 1.151.2;
revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.150 03-Aug-2014  rmind in_pcbdetach: not that IGMP and multicast groups are MP-safe, we can move
the ip_freemoptions() call outside the softnet_lock. Should fix PR/49065.
 1.149 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.148 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.147 22-May-2014  rmind - Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.146 23-Nov-2013  christos branches: 1.146.2;
convert from CIRCLEQ to TAILQ.
 1.145 05-Jun-2013  christos branches: 1.145.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.144 12-Apr-2013  christos PR/47738: connect(2) to 239.x.y.z should return error but does not.
 1.143 25-Jun-2012  christos branches: 1.143.2;
rename rfc6056 -> portalgo, requested by yamt
 1.142 21-Jun-2012  yamt constify, comments.
no functional changes.
 1.141 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.140 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.139 24-Sep-2011  christos branches: 1.139.2; 1.139.6;
Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.138 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.137 12-May-2009  elad branches: 1.137.4; 1.137.6;
Implicit EPERM -> explicit EACCES.

Requested by ad@ and yamt@.
 1.136 09-May-2009  elad Add check for IN_MULTICAST() that was taken only to in_pcbbind_port() --
it's necessary in in_pcbbind_addr() as well.

Pointed out by Mihai Chelaru on tech-net@, thanks!
 1.135 30-Apr-2009  elad Commit changes to netinet6/in6_src.c, forgot in previous commit:

http://mail-index.netbsd.org/source-changes/2009/04/30/msg220547.html

Make in_pcbsetport() set the port number selected before passing "sin" to
kauth(9).
 1.134 30-Apr-2009  elad - Make in6_pcbbind_{addr,port}() static

- Properly authorize port binding in in_pcbsetport() and in6_pcbsetport()

- Pass struct sockaddr_in6 to in6_pcbsetport() instead of just the address,
so that we have a more complete context

- Adjust udp6_output() to craft a sockaddr_in6 as it calls in6_pcbsetport()

- Fix an issue in in_pcbbind() where we used the "dom_sa_any" pointer and
not a copy of it, pointed out by bouyer@, thanks!

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/29/msg001259.html
 1.133 23-Apr-2009  elad - Make kauth(9) call logic match the one in netinet6/in6_pcb.c

- Indent a comment
 1.132 23-Apr-2009  elad Some changes to in_pcbbind():

- Extract guts to in_pcbbind_{addr,port}()

- Put the port auto-assignment logic in in_pcbsetport(), which looks very
similar to in6_pcbsetport()

- Fix a bug where "sin" was passed to kauth(9) without being set to
anything

No objections on tech-net@.
 1.131 14-Apr-2009  elad Don't set sin->sin_port and sin6->sin6_port to 0 before calling
ifa_ifwithaddr(), as we no longer do a byte compare on the entire struct.

Reviewed by and okay from dyoung@.
 1.130 18-Mar-2009  cegger bzero -> memset
 1.129 11-Oct-2008  pooka branches: 1.129.2; 1.129.4; 1.129.8; 1.129.10;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.
 1.128 03-Oct-2008  pooka Hallo, pool_init(). Auf wiedersehen & byebye, link set POOL_INIT().
 1.127 04-Aug-2008  spz typo fix in comment (drops the ' in drop's :)
 1.126 04-Aug-2008  matt Free the socket only after disposing of the PCB.
 1.125 05-May-2008  ad branches: 1.125.2; 1.125.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.124 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.123 24-Apr-2008  ad branches: 1.123.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.122 14-Jan-2008  dyoung branches: 1.122.6; 1.122.8;
Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().
 1.121 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.120 16-Dec-2007  elad Really fix low port allocation, by always passing a valid lwp to
in_pcbbind().

Okay dyoung@.

Note that the network code is another candidate for major cleanup... also
note that this issue is likely to be present in netinet6 code, too.
 1.119 21-Aug-2007  dyoung branches: 1.119.2; 1.119.8; 1.119.10; 1.119.14;
Use sockaddr_in_init().
 1.118 19-Jul-2007  dyoung branches: 1.118.4; 1.118.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.117 02-May-2007  dyoung branches: 1.117.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.116 12-Mar-2007  ad branches: 1.116.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.115 04-Mar-2007  christos branches: 1.115.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.114 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.113 26-Jan-2007  dyoung branches: 1.113.2;
KNF: bzero -> memset, change (struct in_ifaddr *)0 to NULL.
 1.112 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.111 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.110 08-Dec-2006  joerg When a dynamic route is deleted in in_losing and in6_losing, rtrequest
is called, but the current reference via the PCB is not removed. This
is effectively a leaked reference. Call rtfree unconditional.
 1.109 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.108 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.107 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.106 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.105 19-Sep-2006  elad Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.
 1.104 08-Sep-2006  elad branches: 1.104.2;
First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.103 23-Jul-2006  ad branches: 1.103.4;
Use the LWP cached credentials where sane.
 1.102 14-May-2006  elad integrate kauth.
 1.101 15-Nov-2005  dsl branches: 1.101.4; 1.101.6; 1.101.8; 1.101.10; 1.101.12;
Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
 1.100 29-May-2005  christos branches: 1.100.2; 1.100.8;
- add const
- remove bogus casts
- avoid nested variables
 1.99 07-May-2005  christos PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
 1.98 03-Feb-2005  perry ANSIfy function prototypes. (Still have about 3/5ths of the C files in
netinet to go...)
 1.97 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.96 29-Sep-2004  christos branches: 1.96.4; 1.96.6;
PR/27082: Sean Boudreau: redundant assignment or NULL dereference in
in_pcbconnect()
 1.95 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.94 02-Mar-2004  thorpej Call ipsec_pcbconn() and ipsec_pcbdisconn() for FAST_IPSEC, too.
 1.93 13-Jan-2004  itojun avoid deref-after-free.
http://sources.zabbadoz.net/freebsd/patchset/106-ipsec-pcb-discon.diff
 1.92 02-Jan-2004  itojun whitespace
 1.91 11-Nov-2003  jonathan Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.
 1.90 28-Oct-2003  provos use a hash table to bind to local ports; suggested by markus friedl
approved: fvdl@
 1.89 23-Oct-2003  mycroft Remove all the code to maintain ia_inpcbs. This information was only used to
close sockets on address changes, which was deemed to be a bad idea and was
summarily removed, so there is no point in wasting effort on maintaining it
any more.
 1.88 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.87 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.86 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.85 22-Jul-2003  itojun avoid code dup when check broadcast addr in bind(2)
 1.84 21-Jul-2003  itojun permit bind(2) to broadcast address, as it was permitted before.
(for instance, "ntpd -b" was broken since revision 1.82)
found report on http://pc.2ch.net/unix
 1.83 26-Jun-2003  itojun branches: 1.83.2;
check if INADDR_TO_IA gets us valid in_ifaddr or not. hopefully fix PR21964
 1.82 15-Jun-2003  matt Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.81 16-Mar-2003  lukem Enable check in in_pcbbind() to enforce sin_family == AF_INET.
If there are any "old programs which incorrectly set this" left,
they will now fail with EAFNOSUPPORT.
This make in_pcbbind() consistent with in_pcbconnect() and the other
protocol families.

As per my PR [kern/4441], which has the comment:
Steven's "TCP/IP Illustrated, Volume 2", page 730, notes that
in_pcbbind() has the check which determines if sin_family == AF_INET
commented out, but the same check in in_pcbconnect() is still active.
 1.80 22-Oct-2002  simonb "error" in in_pcbbind() was only ever set but not used, remove it.
 1.79 11-Jun-2002  itojun share policy-on-pcb for listening socket. sync w/kame
todo: share even more, avoid frequent updates of spidx
 1.78 09-Jun-2002  itojun whitespace
 1.77 28-May-2002  itojun correct in*_pcbrtentry. check cached value correctly.
 1.76 28-May-2002  itojun in in*_pcbrtentry(), check if route is still valid (RTF_UP),
and address family is still valid.
 1.75 08-Mar-2002  thorpej branches: 1.75.6;
Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.74 22-Jan-2002  itojun make sure to check address family on route cache. with IPv4 mapped
address we can see both AF_INET/INET6.
 1.73 13-Nov-2001  lukem add RCSIDs
 1.72 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.71 06-Aug-2001  itojun branches: 1.71.4;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.
 1.70 25-Jul-2001  itojun allocate ipsec policy buffer attached to pcb in in*_pcballoc, before
giving anyone accesses to pcb (do not reveal an inconsistent ones).
sync with kame
 1.69 02-Jul-2001  itojun branches: 1.69.2;
on interface removal, remove multicast groups joined from pcb, before
removing interface addresses. without the change, we may deref
NULL pointer in in_pcbpurgeif(). from jinmei@kame, sync with kame
 1.68 08-Nov-2000  ad branches: 1.68.2;
Update for hashinit() change.
 1.67 25-Aug-2000  tron Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.
 1.66 06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.65 03-Apr-2000  enami branches: 1.65.4;
- Unselect the multicast outgoing interface if it is being detached.
- Drop the multicast membership if we are joining through the interface
being detached.
 1.64 30-Mar-2000  augustss Remove register declarations.
 1.63 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.62 01-Feb-2000  thorpej Small amount of cosmetic cleanup.
 1.61 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.60 09-Jul-1999  thorpej branches: 1.60.2; 1.60.8;
defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.59 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.58 23-Mar-1999  lukem branches: 1.58.4; 1.58.6;
Ensure that you can only bind a more specific address when it is done by the
same uid or by root.

This code is from FreeBSD. (Whilst it was originally obtained from OpenBSD,
FreeBSD fixed it to work with multicast. To quote the commit message:
- Don't bother checking for conflicting sockets if we're binding to a
multicast address.
- Don't return an error if we're binding to INADDR_ANY, the conflicting
socket is bound to INADDR_ANY, and the conflicting socket has
SO_REUSEPORT set.
)
 1.57 19-Dec-1998  thorpej Reverse the copyright-notice-swap. It went against existing practice.
 1.56 16-Nov-1998  lukem branches: 1.56.2;
if INADDR_ANY is given in in_pcbconnect(), choose the ia_addr of the first
interface, not the ia_broadaddr. should fix [standards/5645] and [kern/6425]
 1.55 13-Nov-1998  lukem simplify test in in_pcbbind() for setting wild=1; no need to check if
((so->so_proto->pr_flags & PR_CONNREQUIRED) == 0 ||
(so->so_options & SO_ACCEPTCONN) == 0)
since the latter is always true, so the former test in unnecessary.
from `TCP/IP Illustrated, Volume 2', W. Richard Stevens, p 730.
 1.54 05-Oct-1998  lukem * in_pcblookup_port(): deprecate INPLOOKUP_WILDCARD and flags in favour
of a lookup_wildcard arg; simplifies the logic a bit.
* when assigning ephemeral ports in in_pcbbind(), always call
in_pcblookup_port() with lookup_wildcard=1, so that ephemeral port
allocation on sockets with SO_REUSEADDR set won't potentially bind to a
port in use by something else (principle of least surprise).
 1.53 30-Sep-1998  tls Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.
 1.52 02-Aug-1998  thorpej Use the pool allocator for inpcbs.
 1.51 23-Jul-1998  pk in_pcballoc(): we can't afford to wait for memory.
 1.50 15-Feb-1998  tls Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.
 1.49 13-Feb-1998  tls Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.
 1.48 07-Feb-1998  chs add flags arg to hashinit(), to pass to malloc().
 1.47 08-Jan-1998  lukem * start from the top of the given ephemeral range and work down;
results in reserved ephemeral ports starting at the top (as per
current practice), and shouldn't have a negative effect on normal
ephemeral ports...
* initialise inpt_lastlow in in_pcbinit
 1.46 08-Jan-1998  lukem add missing ; ...
 1.45 07-Jan-1998  lukem add the following, derived from FreeBSD:
* IP_PORTRANGE socket option, which controls how the ephemeral ports
are allocated. it takes the following settings:
IP_PORTRANGE_DEFAULT use anonportmin (49152) -> anonportmax (65535)
IP_PORTRANGE_HIGH as IP_PORTRANGE_DEFAULT (retained for FreeBSD
compat reasons, where these are separate)
IP_PORTRANGE_LOW use 600 -> 1023. only works if uid==0.
* in_pcb flag INP_ANONPORT. set if port was allocated ephmerally
 1.44 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.43 05-Jan-1998  lukem enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}
 1.42 30-Dec-1997  lukem as per the IANA assigned ports numbers document, use ports
49152..65535 for ephemeral ports (instead of 1024..5000).
closes my [kern/4440], but with correct code :)
 1.41 27-Nov-1997  mrg fix compile error when "options IPNOPROVPORTS"
 1.40 20-Nov-1997  thorpej Deal with a problem where ephemeral port shortage would case a PCB's
local address to be set, causing all further attemps to bind that PCB
to fail. From Koji Imada, PR #3857.
 1.39 14-Oct-1997  matt branches: 1.39.2;
Add support for returning maximum supported MTU when ip_output fails with
EMSGSIZE.
 1.38 22-Sep-1997  thorpej Implement in_pcbrtentry() - return the route associated with a PCB. If
one does not exist, attempt to allocate one. This is mostly pulled from
tcp_input.c.
 1.37 23-Jul-1997  thorpej branches: 1.37.2;
Pull SYN_cache_branch down into the main line.
 1.36 10-Dec-1996  mycroft branches: 1.36.8;
Return EAGAIN if binding with no specified port and the pool is empty.
 1.35 13-Oct-1996  christos backout previous kprintf changes
 1.34 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.33 15-Sep-1996  mycroft Hash unconnected PCBs.
 1.32 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.31 05-Sep-1996  perry Commit PR 2671, which adds an "IPNOPRIVPORTS" config option that turns
off the code that normally only allows root to bind low TCP
ports. Useful on firewalls and such.
 1.30 14-Aug-1996  thorpej Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.
 1.29 10-Jul-1996  cgd print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)
 1.28 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.27 26-Feb-1996  mrg branches: 1.27.4;
two more local addr changes, all done differently now (idea from charles)
 1.26 26-Feb-1996  mrg if we are connecting *to* an address of any local interface, default the
local address of the socket to the same address.
 1.25 13-Feb-1996  christos netinet prototypes
 1.24 31-Jan-1996  mycroft Build a hash table of PCBs. Hash function needs tweaking.
 1.23 17-Aug-1995  mycroft branches: 1.23.2;
so_pcb should be a void *.
 1.22 12-Aug-1995  mycroft splnet --> splsoftnet
 1.21 18-Jun-1995  cgd convert pcb lists to CIRCLEQs, so that the end can be looked at more
easily, and so that the original (insque/remque) logic can be effectively
mimiced. (This fixes a bug in the previous set of list changes.)
also (since terminator is no longer null) reinstate uninitted list checks,
but mark them XXX.
 1.20 12-Jun-1995  mycroft in_pcbnotify*() don't return anything.
 1.19 12-Jun-1995  mycroft Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.
 1.18 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.17 04-Jun-1995  mycroft Remove one more bogus cast.
 1.16 04-Jun-1995  mycroft Don't cast things unnecessarily.
 1.15 04-Jun-1995  mycroft Clean up many more casts.
 1.14 01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.13 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.12 29-Sep-1994  deraadt failure to bind to a reserved port should return EACCES not EPERM.
 1.11 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.10 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.9 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.8 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.7 18-Dec-1993  mycroft Canonicalize all #includes.
 1.6 06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.5 11-Jun-1993  deraadt The latest patch was hosed. There is some program that I used which
left extra crud at the end of the file. I blame ftpd for not doing an
ftruncate().
 1.4 10-Jun-1993  deraadt patch from Yuval Yarom, sent to me by <andrew@werple.apana.org.au>
they say: When doing an implicit bind in_pcbbind will assign used ports
if the port is bound on specific interface, and not on INADDR_ANY.
Effects of the bug range from connection drops to machine hangs.
 1.3 22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.23.2.1 02-Feb-1996  mycroft Bring in changes for mondo patch 2.
 1.27.4.2 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.27.4.1 10-Dec-1996  mycroft From trunk:
Return EAGAIN if binding with no specified port and the pool is empty.
 1.36.8.1 14-May-1997  mellon in_pcbnotify() returns a result indicating whether or not it actually matched a pcb.
 1.37.2.2 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.37.2.1 29-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.39.2.4 18-Jan-1999  cgd pull up rev 1.56 from trunk (PR#5645 and PR#6425). (lukem)
 1.39.2.3 01-Oct-1998  cgd pull up revisions 1.49-1.50, 1.53 (via patch) from trunk. (tls)
 1.39.2.2 28-Nov-1997  mellon Pull rev 1.41 up from trunk (mrg)
 1.39.2.1 20-Nov-1997  thorpej Fix PCB binding problem caused by ephemeral port shortage.
 1.56.2.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.58.6.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.58.6.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.58.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.58.4.2 02-Aug-1999  thorpej Update from trunk.
 1.58.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.60.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.60.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.60.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.65.4.1 26-Aug-2000  tron Pull up from current (approved by thorpej):

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.

syssrc/sys/netinet/in.h 1.49 -> 1.50
syssrc/sys/netinet/in_pcb.c 1.66 -> 1.67
syssrc/sys/netinet/ip_input.c 1.116 -> 1.117
syssrc/sys/netinet/ip_var.h 1.41 -> 1.42
 1.68.2.6 11-Nov-2002  nathanw Catch up to -current
 1.68.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.68.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.68.2.3 28-Feb-2002  nathanw Catch up to -current.
 1.68.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.68.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.69.2.6 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.69.2.5 16-Mar-2002  jdolecek Catch up with -current.
 1.69.2.4 11-Feb-2002  jdolecek Sync w/ -current.
 1.69.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.69.2.2 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.69.2.1 03-Aug-2001  lukem update to -current
 1.71.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.75.6.2 20-Jun-2002  gehenna catch up with -current.
 1.75.6.1 30-May-2002  gehenna Catch up with -current.
 1.83.2.7 11-Dec-2005  christos Sync with head.
 1.83.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.83.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.83.2.4 19-Oct-2004  skrll Sync with HEAD
 1.83.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.83.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.83.2.1 03-Aug-2004  skrll Sync with HEAD
 1.96.6.1 12-Feb-2005  yamt sync with head.
 1.96.4.1 29-Apr-2005  kent sync with -current
 1.100.8.1 22-Nov-2005  yamt sync with head.
 1.100.2.5 21-Jan-2008  yamt sync with head
 1.100.2.4 03-Sep-2007  yamt sync with head.
 1.100.2.3 26-Feb-2007  yamt sync with head.
 1.100.2.2 30-Dec-2006  yamt sync with head.
 1.100.2.1 21-Jun-2006  yamt sync with head.
 1.101.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.101.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.101.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.101.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.101.8.3 14-Sep-2006  yamt sync with head.
 1.101.8.2 11-Aug-2006  yamt sync with head
 1.101.8.1 24-May-2006  yamt sync with head.
 1.101.6.1 01-Jun-2006  kardel Sync with head.
 1.101.4.4 09-Sep-2006  rpaulo sync with head
 1.101.4.3 14-Mar-2006  rpaulo Remove last reference to in6pcb.
 1.101.4.2 14-Feb-2006  rpaulo Remove INPCBHASH_PORT, INPCBHASH_BIND, INPCBHASH_CONNECT (moved to
in_pcb.h).
If INET6, detect whether the pcb is v4 or v6 based on the socket
family (from FreeBSD).
 1.101.4.1 02-Feb-2006  rpaulo Remove #include netinet6/in6_pcb.h.
 1.103.4.3 01-Feb-2007  ad Sync with head.
 1.103.4.2 12-Jan-2007  ad Sync with head.
 1.103.4.1 18-Nov-2006  ad Sync with head.
 1.104.2.3 18-Dec-2006  yamt sync with head.
 1.104.2.2 10-Dec-2006  yamt sync with head.
 1.104.2.1 22-Oct-2006  yamt sync with head
 1.113.2.4 07-May-2007  yamt sync with head.
 1.113.2.3 24-Mar-2007  yamt sync with head.
 1.113.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.113.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.115.2.4 09-Oct-2007  ad Sync with head.
 1.115.2.3 20-Aug-2007  ad Sync with HEAD.
 1.115.2.2 08-Jun-2007  ad Sync with head.
 1.115.2.1 13-Mar-2007  ad Sync with head.
 1.116.2.1 11-Jul-2007  mjf Sync with head.
 1.117.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.117.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.118.6.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.118.6.1 19-Jul-2007  dyoung file in_pcb.c was added on branch matt-mips64 on 2007-07-19 20:48:55 +0000
 1.118.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.119.14.2 19-Jan-2008  bouyer Sync with HEAD
 1.119.14.1 02-Jan-2008  bouyer Sync with HEAD
 1.119.10.1 26-Dec-2007  ad Sync with head.
 1.119.8.1 18-Feb-2008  mjf Sync with HEAD.
 1.119.2.2 23-Mar-2008  matt sync with HEAD
 1.119.2.1 09-Jan-2008  matt sync with HEAD
 1.122.8.1 18-May-2008  yamt sync with head.
 1.122.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.122.6.3 05-Oct-2008  mjf Sync with HEAD.
 1.122.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.122.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.123.2.3 16-May-2009  yamt sync with head
 1.123.2.2 04-May-2009  yamt sync with head.
 1.123.2.1 16-May-2008  yamt sync with head.
 1.125.6.1 19-Oct-2008  haad Sync with HEAD.
 1.125.2.2 10-Oct-2008  skrll Sync with HEAD.
 1.125.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.129.10.1 10-May-2009  snj branches: 1.129.10.1.2;
Apply patch (requested by sborrill in ticket #745):
Fix compilation with IPNOPRIVPORTS option.
 1.129.10.1.2.1 21-Apr-2010  matt sync to netbsd-5
 1.129.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.129.4.1 10-May-2009  snj Apply patch (requested by sborrill in ticket #745):
Fix compilation with IPNOPRIVPORTS option.
 1.129.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.137.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.137.4.1 31-May-2011  rmind sync with head
 1.139.6.2 05-Apr-2012  mrg sync to latest -current.
 1.139.6.1 18-Feb-2012  mrg merge to -current.
 1.139.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.139.2.2 30-Oct-2012  yamt sync with head
 1.139.2.1 17-Apr-2012  yamt sync with head
 1.143.2.3 03-Dec-2017  jdolecek update from HEAD
 1.143.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.143.2.1 23-Jun-2013  tls resync from head
 1.145.2.4 18-May-2014  rmind sync with head
 1.145.2.3 23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.145.2.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.145.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.146.2.1 10-Aug-2014  tls Rebase.
 1.151.2.2 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.151.2.1 08-Sep-2014  msaitoh Pull up following revision(s) (requested by rmind in ticket #80):
sys/netinet6/in6_pcb.c: revision 1.129
sys/netinet/in_pcb.c: revision 1.152
in_pcbdetach: move ip_freemoptions() under softnet_lock for now (this will
be changed back once other IP paths become MP-safe). Same for IPv6 routine.
This partially reverts 1.150 of in_pcb.c and 1.127 of in6_pcb.c changes.
 1.155.2.8 28-Aug-2017  skrll Sync with HEAD
 1.155.2.7 05-Feb-2017  skrll Sync with HEAD
 1.155.2.6 05-Oct-2016  skrll Sync with HEAD
 1.155.2.5 09-Jul-2016  skrll Sync with HEAD
 1.155.2.4 19-Mar-2016  skrll Sync with HEAD
 1.155.2.3 22-Sep-2015  skrll Sync with HEAD
 1.155.2.2 06-Jun-2015  skrll Sync with HEAD
 1.155.2.1 06-Apr-2015  skrll Sync with HEAD
 1.166.2.6 26-Apr-2017  pgoyette Sync with HEAD
 1.166.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.166.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.166.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.166.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.166.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.173.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.178.4.3 18-Mar-2018  martin Pull up following revision(s) (requested by tih in ticket #639):
sys/kern/uipc_socket.c: revision 1.258
sys/kern/uipc_socket.c: revision 1.259
sys/netinet/ip_input.c: revision 1.364 (via patch)
sys/netinet/ip_output.c: revision 1.289
sys/netinet/in.h: revision 1.102
sys/netinet/in_pcb.c: revision 1.181
share/man/man9/sockopt.9: revision 1.11
sys/netinet/in_pcb.h: revision 1.65
sys/sys/socketvar.h: revision 1.146
sys/kern/uipc_syscalls.c: revision 1.189
sys/netinet/ip_output.c: revision 1.290
share/man/man4/ip.4: revision 1.41
share/man/man4/ip.4: revision 1.42
sys/kern/uipc_syscalls.c: revision 1.190

pass valsize for getsockopt like we do for setsockopt
make sure that we have enough space, don't require the exact size
(Tom Ivar Helbekkmo)

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo

new sentence-new line

Remove comment now that the getsockopt code passes the size.

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).
(Tom Ivar Helbekkmo)
 1.178.4.2 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.178.4.1 21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.182.4.1 10-Jun-2019  christos Sync with HEAD
 1.76 04-Nov-2022  ozaki-r inpcb: use in_port_t for port numbers
 1.75 04-Nov-2022  ozaki-r inpcb: rename functions to in6pcb_*
 1.74 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.73 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.72 28-Oct-2022  ozaki-r Remove in_pcb_hdr.h
 1.71 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.70 10-Jun-2022  knakahara "inp_hash" is not used now.
 1.69 08-Sep-2020  christos Add IP_BINDANY, IPV6_BINDANY which can be used to bind to any address in
order to implement transparent proxies.
 1.68 28-Aug-2020  riastradh netinet: Include the needful so include order doesn't matter.
 1.67 20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.66 31-May-2018  maxv Remove support for non-IKE markers in the kernel. Discussed on tech-net@,
and now in PR/53334. Basically non-IKE markers come from a deprecated
draft, and our kernel code for them has never worked.

Setsockopt will now reject UDP_ENCAP_ESPINUDP_NON_IKE.

Perhaps we should also add a check in key_handle_natt_info(), to make
sure we also reject UDP_ENCAP_ESPINUDP_NON_IKE in the SADB.
 1.65 01-Jan-2018  christos branches: 1.65.2;
1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo
 1.64 10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.63 02-Mar-2017  ozaki-r branches: 1.63.6;
Make sure imo_membership is protected by inp's lock (solock)
 1.62 22-Feb-2017  ozaki-r Add assertions and comments for lock states of socket and pcb
 1.61 08-Dec-2016  ozaki-r branches: 1.61.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.60 26-Apr-2016  ozaki-r branches: 1.60.2;
Sweep unnecessary route.h inclusions
 1.59 24-May-2015  rtr remove transitional functions in{,6}_pcbconnect_m() that were used in
converting protocol user requests to accept sockaddr instead of mbufs.

remove tcp_input copy in to mbuf from sockaddr and just copy to sockaddr
to make it possible for the transitional functions to go away.

no version bump since these functions only existed for a short time and
were commented as adapters (they appeared in 7.99.15).
 1.58 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.57 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.56 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.55 25-Nov-2014  seanb branches: 1.55.2;
Clean up any dangling ifp references in (struct in6pcb *)->in6p_v4moptions
(v4 multicast options off v4 mapped v6 socket) on interface destruction. The
code to clean this up in a true v4 socket was moved to its own function
which is now also called in the corresponding place for v6 sockets on
interface destruction.
 1.54 05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.53 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.52 22-May-2014  rmind - Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.51 27-Jun-2013  christos branches: 1.51.2; 1.51.6;
implement IP_PKTINFO and IP_RECVPKTINFO.
 1.50 25-Jun-2012  christos branches: 1.50.2;
rename rfc6056 -> portalgo, requested by yamt
 1.49 24-Sep-2011  christos branches: 1.49.2;
Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.48 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.47 17-Jul-2009  minskim branches: 1.47.4; 1.47.6;
Add the IP_MINTTL socket option.

The IP_MINTTL option may be used on SOCK_STREAM sockets to discard
packets with a TTL lower than the option value. This can be used to
implement the Generalized TTL Security Mechanism (GTSM) according to
RFC 3682.

OK'ed by christos@.
 1.46 16-Jul-2009  minskim Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.
 1.45 16-Dec-2007  elad branches: 1.45.10; 1.45.24;
Oops. Remove kauth.h inclusion.

Pointed out by gdt@, thanks.
 1.44 16-Dec-2007  elad Really fix low port allocation, by always passing a valid lwp to
in_pcbbind().

Okay dyoung@.

Note that the network code is another candidate for major cleanup... also
note that this issue is likely to be present in netinet6 code, too.
 1.43 19-Sep-2007  dyoung branches: 1.43.8;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.42 23-Jul-2006  ad branches: 1.42.14; 1.42.28; 1.42.30;
Use the LWP cached credentials where sane.
 1.41 10-Dec-2005  elad branches: 1.41.4; 1.41.8;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.40 15-Nov-2005  dsl Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
 1.39 12-Feb-2005  manu branches: 1.39.6; 1.39.12;
Add support for IPsec Network Address Translator traversal (NAT-T), as
described by RFC 3947 and 3948.
 1.38 21-Apr-2004  itojun branches: 1.38.4; 1.38.6;
no space between function name and paren: foo (blah) -> foo(blah)
 1.37 18-Apr-2004  matt De __P()
 1.36 23-Oct-2003  mycroft Remove all the code to maintain ia_inpcbs. This information was only used to
close sockets on address changes, which was deemed to be a bad idea and was
summarily removed, so there is no point in wasting effort on maintaining it
any more.
 1.35 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.34 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.33 15-Jun-2003  matt branches: 1.33.2;
Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.32 02-Nov-2002  itojun cleanup ipsec.h dependency. commented by perry, sync w/kame
 1.31 09-Jun-2002  itojun whitespace
 1.30 02-Jul-2001  itojun branches: 1.30.2; 1.30.14;
on interface removal, remove multicast groups joined from pcb, before
removing interface addresses. without the change, we may deref
NULL pointer in in_pcbpurgeif(). from jinmei@kame, sync with kame
 1.29 02-Feb-2000  thorpej branches: 1.29.6;
PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.28 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.27 01-Jul-1999  itojun branches: 1.27.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.26 05-Oct-1998  lukem branches: 1.26.8; 1.26.10;
* in_pcblookup_port(): deprecate INPLOOKUP_WILDCARD and flags in favour
of a lookup_wildcard arg; simplifies the logic a bit.
* when assigning ephemeral ports in in_pcbbind(), always call
in_pcblookup_port() with lookup_wildcard=1, so that ephemeral port
allocation on sockets with SO_REUSEADDR set won't potentially bind to a
port in use by something else (principle of least surprise).
 1.25 18-May-1998  matt Move the ppcb pointer towards the front of the structure so that it and the
pcb chain pointers can possibly be in the same cache line.
 1.24 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.23 07-Jan-1998  lukem add the following, derived from FreeBSD:
* IP_PORTRANGE socket option, which controls how the ephemeral ports
are allocated. it takes the following settings:
IP_PORTRANGE_DEFAULT use anonportmin (49152) -> anonportmax (65535)
IP_PORTRANGE_HIGH as IP_PORTRANGE_DEFAULT (retained for FreeBSD
compat reasons, where these are separate)
IP_PORTRANGE_LOW use 600 -> 1023. only works if uid==0.
* in_pcb flag INP_ANONPORT. set if port was allocated ephmerally
 1.22 14-Oct-1997  matt Add support for returning maximum supported MTU when ip_output fails with
EMSGSIZE.
 1.21 22-Sep-1997  thorpej Implement in_pcbrtentry() - return the route associated with a PCB. If
one does not exist, attempt to allocate one. This is mostly pulled from
tcp_input.c.
 1.20 23-Jul-1997  thorpej branches: 1.20.2;
Pull SYN_cache_branch down into the main line.
 1.19 11-Jan-1997  thorpej branches: 1.19.8;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.
 1.18 17-Sep-1996  mycroft Overlay inp_faddr and inp_laddr into the header prototype.
 1.17 15-Sep-1996  mycroft Hash unconnected PCBs.
 1.16 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.15 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.14 13-Feb-1996  christos branches: 1.14.4;
netinet prototypes
 1.13 31-Jan-1996  mycroft Build a hash table of PCBs. Hash function needs tweaking.
 1.12 18-Jun-1995  cgd branches: 1.12.2;
convert pcb lists to CIRCLEQs, so that the end can be looked at more
easily, and so that the original (insque/remque) logic can be effectively
mimiced. (This fixes a bug in the previous set of list changes.)
also (since terminator is no longer null) reinstate uninitted list checks,
but mark them XXX.
 1.11 12-Jun-1995  mycroft in_pcbnotify*() don't return anything.
 1.10 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.9 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.8 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.7 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.6 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.5 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.4 08-Dec-1993  hpeyerl More multicast stuff.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.12.2.1 02-Feb-1996  mycroft Bring in changes for mondo patch 2.
 1.14.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.19.8.1 14-May-1997  mellon Change return value for in_pcbnotify().
 1.20.2.2 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.20.2.1 29-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.26.10.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.26.10.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.26.8.1 01-Jul-1999  thorpej Sync w/ -current.
 1.27.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.29.6.3 11-Nov-2002  nathanw Catch up to -current
 1.29.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.29.6.1 24-Aug-2001  nathanw Catch up with -current.
 1.30.14.1 20-Jun-2002  gehenna catch up with -current.
 1.30.2.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.33.2.5 11-Dec-2005  christos Sync with head.
 1.33.2.4 15-Feb-2005  skrll Sync with HEAD.
 1.33.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.33.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.33.2.1 03-Aug-2004  skrll Sync with HEAD
 1.38.6.1 12-Feb-2005  yamt sync with head.
 1.38.4.1 29-Apr-2005  kent sync with -current
 1.39.12.1 22-Nov-2005  yamt sync with head.
 1.39.6.3 27-Oct-2007  yamt sync with head.
 1.39.6.2 30-Dec-2006  yamt sync with head.
 1.39.6.1 21-Jun-2006  yamt sync with head.
 1.41.8.1 11-Aug-2006  yamt sync with head
 1.41.4.8 09-Sep-2006  rpaulo sync with head
 1.41.4.7 10-Feb-2006  rpaulo * move IN6_HASH() to ifdef INET6
* introduce INP_SOCKAF() from FreeBSD
* add IN{,6}PCBHASH_{CONNECT,PORT,BIND} from in6_pcb.c and in_pcb.c
 1.41.4.6 07-Feb-2006  rpaulo Add FreeBSD's locking defines (currently defined to nothing) and a
'struct lock' inside 'struct inpcb'.
 1.41.4.5 07-Feb-2006  rpaulo Fix wrong indentation.
 1.41.4.4 05-Feb-2006  rpaulo Remove 'inp_options' and 'in6p_options' define.
 1.41.4.3 04-Feb-2006  rpaulo struct mbuf can be shared between in4p_depend and in6p_depend.
 1.41.4.2 04-Feb-2006  rpaulo Add IN6P_{ATTACHED,BOUND,CONNECTED} for KAME src compatibility.
 1.41.4.1 01-Feb-2006  rpaulo Merge in6pcb with inpcb and remove inpcb_hdr since that's no longer needed.
 1.42.30.1 06-Nov-2007  matt sync with HEAD
 1.42.28.1 02-Oct-2007  joerg Sync with HEAD.
 1.42.14.1 09-Oct-2007  ad Sync with head.
 1.43.8.1 26-Dec-2007  ad Sync with head.
 1.45.24.1 23-Jul-2009  jym Sync with HEAD.
 1.45.10.2 19-Aug-2009  yamt sync with head.
 1.45.10.1 18-Jul-2009  yamt sync with head.
 1.47.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.47.4.1 31-May-2011  rmind sync with head
 1.49.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.49.2.1 30-Oct-2012  yamt sync with head
 1.50.2.2 03-Dec-2017  jdolecek update from HEAD
 1.50.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.51.6.1 10-Aug-2014  tls Rebase.
 1.51.2.3 18-May-2014  rmind sync with head
 1.51.2.2 23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.51.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.55.2.5 28-Aug-2017  skrll Sync with HEAD
 1.55.2.4 05-Feb-2017  skrll Sync with HEAD
 1.55.2.3 29-May-2016  skrll Sync with HEAD
 1.55.2.2 06-Jun-2015  skrll Sync with HEAD
 1.55.2.1 06-Apr-2015  skrll Sync with HEAD
 1.60.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.60.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.61.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.63.6.2 18-Mar-2018  martin Pull up following revision(s) (requested by tih in ticket #639):
sys/kern/uipc_socket.c: revision 1.258
sys/kern/uipc_socket.c: revision 1.259
sys/netinet/ip_input.c: revision 1.364 (via patch)
sys/netinet/ip_output.c: revision 1.289
sys/netinet/in.h: revision 1.102
sys/netinet/in_pcb.c: revision 1.181
share/man/man9/sockopt.9: revision 1.11
sys/netinet/in_pcb.h: revision 1.65
sys/sys/socketvar.h: revision 1.146
sys/kern/uipc_syscalls.c: revision 1.189
sys/netinet/ip_output.c: revision 1.290
share/man/man4/ip.4: revision 1.41
share/man/man4/ip.4: revision 1.42
sys/kern/uipc_syscalls.c: revision 1.190

pass valsize for getsockopt like we do for setsockopt
make sure that we have enough space, don't require the exact size
(Tom Ivar Helbekkmo)

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo

new sentence-new line

Remove comment now that the getsockopt code passes the size.

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).
(Tom Ivar Helbekkmo)
 1.63.6.1 21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.65.2.1 25-Jun-2018  pgoyette Sync with HEAD
 1.16 28-Oct-2022  ozaki-r Remove in_pcb_hdr.h
 1.15 28-Aug-2020  riastradh netinet: Include the needful so include order doesn't matter.
 1.14 20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.13 02-Jun-2017  ozaki-r Assert inph_locked on ipsec_pcb_skip_ipsec (was IPSEC_PCB_SKIP_IPSEC)

The assertion confirms SP caches are accessed under inph lock (solock).
 1.12 25-Apr-2017  ozaki-r Check if solock of PCB is held when SP caches in the PCB are accessed

To this end, a back pointer from inpcbpolicy to inpcb_hdr is added.
 1.11 30-May-2014  christos branches: 1.11.4; 1.11.8;
Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.10 23-Nov-2013  christos branches: 1.10.2;
expose the pcb queue structure for convenience
 1.9 23-Nov-2013  christos convert from CIRCLEQ to TAILQ.
 1.8 25-Jun-2012  christos branches: 1.8.2; 1.8.4;
rename rfc6056 -> portalgo, requested by yamt
 1.7 24-Sep-2011  christos branches: 1.7.2;
Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.6 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.5 04-Mar-2007  christos branches: 1.5.64; 1.5.70;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.4 10-Dec-2005  elad branches: 1.4.4; 1.4.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.3 02-Mar-2004  thorpej branches: 1.3.4; 1.3.18;
Define a sotoinpcb_hdr() macro (a'la sotoinpcb()).
 1.2 28-Oct-2003  provos use a hash table to bind to local ports; suggested by markus friedl
approved: fvdl@
 1.1 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.3.18.2 03-Sep-2007  yamt sync with head.
 1.3.18.1 21-Jun-2006  yamt sync with head.
 1.3.4.5 11-Dec-2005  christos Sync with head.
 1.3.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.3.4.2 03-Aug-2004  skrll Sync with HEAD
 1.3.4.1 02-Mar-2004  skrll file in_pcb_hdr.h was added on branch ktrace-lwp on 2004-08-03 10:54:36 +0000
 1.4.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.4.4.1 01-Feb-2006  rpaulo Merge in6pcb with inpcb and remove inpcb_hdr since that's no longer needed.
 1.5.70.1 06-Jun-2011  jruoho Sync with HEAD.
 1.5.64.1 31-May-2011  rmind sync with head
 1.7.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.7.2.1 30-Oct-2012  yamt sync with head
 1.8.4.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.8.2.2 03-Dec-2017  jdolecek update from HEAD
 1.8.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.10.2.1 10-Aug-2014  tls Rebase.
 1.11.8.1 26-Apr-2017  pgoyette Sync with HEAD
 1.11.4.1 28-Aug-2017  skrll Sync with HEAD
 1.1 02-Dec-2014  christos branches: 1.1.2; 1.1.18;
add routines to print in_addr and sockaddr_in (in_print and sin_print)
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 02-Dec-2014  jdolecek file in_print.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 06-Apr-2015  skrll Sync with HEAD
 1.1.2.1 02-Dec-2014  skrll file in_print.c was added on branch nick-nhusb on 2015-04-06 15:18:23 +0000
 1.131 03-Sep-2022  thorpej Garbage-collect everything related to struct domain::dom_ifqueues
(except dom_ifqueues itself, until the next kernel version bump).
It's no longer used now that nothing uses the legacy netisr mechanism.
 1.130 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.129 14-Aug-2018  maxv Retire EtherIP, we have L2TP instead.
 1.128 03-May-2018  maxv branches: 1.128.2;
Remove now unused tcpip.h includes. Some were already unused before.
 1.127 15-Mar-2018  maxv Add the PR_LASTHDR flag on the PFsync and CARP entries. Otherwise a
"require" IPsec policy is not enforced on them, and unauthenticated
packets will be accepted.

Tested with a require-AH configuration. Sent on tech-net@, no comment.
 1.126 05-Feb-2018  maxv branches: 1.126.2;
Declare icmperrppslim in ip_icmp.c, it shouldn't be used elsewhere.
 1.125 27-Sep-2017  ozaki-r Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
 1.124 21-Sep-2017  ozaki-r Invalidate rtcache based on a global generation counter

The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.

One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.

This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
 1.123 14-Apr-2017  ozaki-r branches: 1.123.4;
Rumpify netipsec

Note that we should modularize netipsec and reduce reverse symbol references
(referencing symbols of netipsec from net, netinet and netinet6) though,
the task needs lots of code changes. Prior to doing so, rumpifying it and
having ATF tests should be useful.
 1.122 16-Feb-2017  knakahara add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.121 13-Feb-2017  ozaki-r Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.
 1.120 26-Apr-2016  ozaki-r branches: 1.120.2; 1.120.4;
Sweep unnecessary route.h inclusions
 1.119 11-Apr-2016  ozaki-r Sweep unncessary radix.h inclusions
 1.118 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.117 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.116 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.115 13-Oct-2015  rjs Add core networking support for SCTP.
 1.114 31-Aug-2015  ozaki-r Replace ARP cache (llinfo) with lltable/llentry

Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
- ARP specific data are stored in the hashed list
of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
- the global timer callout with the big locks can be
removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
- it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
- it was a parameter that prevents expiration of active caches
- Removed to simplify the timer logic, but we may be able to
restore the feature if really needed

Proposed on tech-kern and tech-net.
 1.113 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.112 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.111 10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.110 05-Jun-2014  rmind branches: 1.110.4;
- Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.109 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.108 20-Mar-2014  christos branches: 1.108.2;
need compat header.
 1.107 02-Jan-2014  pooka Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
 1.106 05-Jun-2013  christos branches: 1.106.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.105 02-Mar-2013  christos Under FAST_IPSEC, IPSEC_ESP is mandatory; GC it.
 1.104 01-Mar-2013  joerg Retire OSI network stack. OK core@
 1.103 22-Mar-2012  drochner branches: 1.103.2;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.102 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.101 03-May-2011  dyoung branches: 1.101.4; 1.101.8;
*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
 1.100 31-Mar-2011  dyoung Hide the radix-trie implementation of the forwarding table so that we
will have an easier time replacing it with something different, even if
it is a second radix-trie implementation.

sys/net/route.c and sys/net/rtsock.c no longer operate directly on
radix_nodes or radix_node_heads.

Hopefully this will reduce the temptation to implement multipath or
source-based routing using grotty hacks to the grotty old radix-trie
code, too. :-)
 1.99 16-Sep-2009  pooka branches: 1.99.4; 1.99.6;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.98 14-Sep-2009  degroote Import pfsync support from OpenBSD 4.2

Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.

This work was part of my 2009 GSoC

No objection on tech-net@
 1.97 28-Feb-2009  pooka include opt_gateway
 1.96 01-Feb-2009  pooka branches: 1.96.2;
Init ipflow pool dynamically instead of using a linkset.
 1.95 25-Nov-2008  pooka Make dom_maxrtkey of inet/inet6domain the size of the ip_encap pack
structures. This is far from optimal, but gets rid of iffy
#ifdef INET in radix.c. The radix bonsai still needs lots of love
before loading domains dynamically is possible...
 1.94 24-Apr-2008  ad branches: 1.94.2; 1.94.8; 1.94.10;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.93 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.92 15-Apr-2008  thorpej branches: 1.92.2;
Make IGMP stats per-cpu.
 1.91 05-Oct-2007  dyoung branches: 1.91.18;
Work in progress: use a raw socket for GRE in IP encapsulation
instead of adding/subtracting our own IPv4 header.

There are many benefits: gre(4) needn't grok the outer encapsulation
header any longer, so this simplifies the gre(4) code. The IP
stack needn't grok GRE, so it is simplified, too. gre(4) will
benefit from optimizations in the socket code. Eventually, gre(4)
will gain an IPv6 encapsulation with very few new lines of code.

There is a small performance loss. A 133 MHz, 486-class AMD Elan
sinks/sources a TCP stream over GRE with about 93% the throughput
of the old code. TCP throughput on a 266 MHz, 586-class AMD Geode
is about 96% the throughput of the old code. A 175-MHz ADM5120
(MIPS) only sinks a TCP stream over GRE at about 90% of the old
code; I am still investigating that.

I produced stripped-down versions of sosend() and soreceive() for
gre(4) to use. They are guaranteed not to block, so they can be
called from a software interrupt and from a socket upcall,
respectively.

A kernel thread is no longer necessary for socket transmit/receive,
but I didn't get around to removing it, yet.

Thanks to Matt Thomas for suggesting the use of stripped-down socket
code and software interrupts, and to Andrew Doran for advice and
answers concerning software interrupts, threads, and performance.
 1.90 19-Sep-2007  dyoung branches: 1.90.2;
Don't use INADDR_ANY to initialize a const struct, because INADDR_ANY
is not necessarily const.
 1.89 19-Sep-2007  dyoung 1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.88 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.87 13-Jun-2007  dyoung branches: 1.87.2; 1.87.6; 1.87.8;
Use __arraycount().
 1.86 06-May-2007  dyoung In AppleTalk, IPv4, and IPv6 routing domains, help sockaddr_cmp()
avoid an indirect function call by comparing the family, length,
and bytes [dom->dom_sa_cmpofs, dom->dom_sa_cmpofs + dom->dom_sa_cmplen),
corresponding to the the sockaddrs' "address" members.

For ISO, actually use sockaddr_iso_cmp, for a change. Thanks to
yamt@ for pointing out my error.
 1.85 02-May-2007  dyoung Remove obsolete files netinet/in_route.[ch].
 1.84 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.83 05-Mar-2007  liamjfoy branches: 1.83.2; 1.83.4;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@
 1.82 04-Mar-2007  liamjfoy inet6domain -> inetdomain

thanks simon
 1.81 04-Mar-2007  liamjfoy Initialize protocol switch with structure initializers.

ok christos@
 1.80 09-Dec-2006  dyoung branches: 1.80.2;
Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.79 23-Nov-2006  rpaulo New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.78 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.77 10-Oct-2006  dogcow change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)
 1.76 07-Sep-2006  dogcow branches: 1.76.2; 1.76.4;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.75 30-Aug-2006  christos add missing initializers
 1.74 28-Aug-2006  christos Remove excess initializer.
 1.73 25-Aug-2006  matt One step closer to loadable domains. Store pointers to a domain's soft
interrupt queues so if_detach can remove packets to removed interfaces from
them. This eliminates a lot of conditional ugly code in if.c
 1.72 18-May-2006  liamjfoy Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.71 11-Dec-2005  christos branches: 1.71.4; 1.71.6; 1.71.8; 1.71.12;
merge ktrace-lwp.
 1.70 19-Jul-2005  gdt Add PR_PURGEIF flag for protocols to indicate that the protocol might
store a struct ifnet *, and define it for udp/tcp/rawip for INET and
INET6. When deleting a struct ifnet, invoke PRU_PURGEIF on all
protocols marked with PR_PURGEIF. Closes PR kern/29580 (mine).
 1.69 29-Apr-2005  yamt branches: 1.69.2;
move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.
 1.68 12-Feb-2005  manu branches: 1.68.4;
Add support for IPsec Network Address Translator traversal (NAT-T), as
described by RFC 3947 and 3948.
 1.67 31-Jan-2005  kim Add RFC 3378 EtherIP support, ported from OpenBSD to NetBSD by
Hans Rosenfeld (rosenfeld at grumpf.hope-2000.org)

This change makes it possible to add gif interfaces to bridges, which
will then send and receive IP protocol 97 packets. Packets are Ethernet
frames with an EtherIP header prepended.
 1.66 23-Jan-2005  matt branches: 1.66.2;
Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
 1.65 04-Sep-2004  manu branches: 1.65.4;
IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
 1.64 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.63 22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.62 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.61 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.60 14-Aug-2003  itojun enforce ipsec policy on raw wildcard.
 1.59 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.58 02-Nov-2002  itojun branches: 1.58.6;
cleanup ipsec.h dependency. commented by perry, sync w/kame
 1.57 25-Sep-2002  itojun one too many whitespace
 1.56 09-Jun-2002  itojun whitespace
 1.55 04-Mar-2002  sommerfeld branches: 1.55.6; 1.55.8;
The "gif*" tunnelling interface does everything ipip does.
Move usage example from ipip.4 to gif.4
Excise ipip and stitch up the scars.
 1.54 21-Dec-2001  itojun use radix table for inbound tunnel lookup (would increase performance
for machines with a lot of tunnels).
update route cache for IPvX-over-IPv6 tunnel on path MTU discovery.
snyc with kame
 1.53 21-Dec-2001  itojun call rip_ctlinput on icmp4 inputs
 1.52 21-Dec-2001  itojun move protosw fragment for gif/stf to their own source code.
reduce #ifdef in stf code. sync with kame
 1.51 13-Nov-2001  lukem add RCSIDs
 1.50 30-Oct-2001  kml Add in support for timing out IPv4 routes added due to redirects,
as discussed in tech-net several weeks ago. It turned out that
KAME had already added this functionality to the IPv6 stack, so
I followed their example in adding the sysctl variables
net.inet.icmp.rediraccept and net.inet.icmp.redirtimeout.
 1.49 10-Sep-2001  thorpej branches: 1.49.2;
Use a callout for the delayed ACK timer, and delete tcp_fasttimo().
Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
 1.48 21-Mar-2001  thorpej branches: 1.48.2; 1.48.4;
Add a protosw flag, PR_ABRTACPTDIS (Abort on Accept of Disconnected
Socket), and add it to the protocols that use that behavior (all
PR_LISTEN protocols except for PF_LOCAL stream sockets).
 1.47 01-Mar-2001  itojun branches: 1.47.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code
 1.46 21-Feb-2001  itojun need PR_ADDR|PR_ATOMIC for IPPROTO_EON. fix typo. from chopps, sync with kame
 1.45 20-Feb-2001  itojun ISO over IPv4/v6 by EON encapsulation. from chopps, sync with kame.
 1.44 18-Oct-2000  thorpej Restructure the Path MTU Discovery code somewhat to avoid
entering rtentry's for hosts we're not actually communicating
with.

Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
* TCP -- Lookup the connection based on the address/port
pairs in the ICMP message.
* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.

If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes. For TCP, this is where we now
refresh cached routes and re-enter slow-start.

As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.

XXX Note, this is only a fix for the IPv4 case. For the IPv6
XXX case, we need to wait for the KAME folks.

Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
 1.43 18-Oct-2000  itojun move tcp syn cache parameters from in_proto.c to tcp_subr.c.
it makes more sense and helps INET6-only (INET-less) build.
 1.42 28-Jul-2000  itojun nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit
 1.41 27-Jul-2000  itojun implement net.inet.tcp.rstppslimit to limit TCP RSTs by packet-per-second
basis. default: 100pps

set default value for net.inet.tcp.rstratelimit to 0 (disabled),
NOTE: it does not work right for smaller-than-1/hz interval. maybe we should
nuke it, or make it impossible to set smaller-than-1/hz value.
 1.40 10-Jul-2000  itojun implement net.inet.icmp.errppslimit.
make default value for net.inet.icmp.erratelimit to 0, as < 10ms value
does not do the right thing.
 1.39 19-Apr-2000  itojun branches: 1.39.4;
introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.38 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.37 15-Feb-2000  thorpej Add support for rate-limiting RSTs sent in response to no socket for
an incoming packet. Default minimum interval is 10ms. The interval
is changeable via the "net.inet.tcp.rstratelimit" sysctl variable.
 1.36 15-Feb-2000  thorpej Add ICMP error rate limiting, based on the same for ICMP6.

Note, we're reusing the previously unused slot for "MTU discovery" (which
was moved to the "net.inet.ip" branch of the sysctl tree quite some time
ago).
 1.35 10-Feb-2000  itojun fix ip4 protosw.
gif interface and gre interface should be able to coexist.
 1.34 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.33 09-Jul-1999  thorpej branches: 1.33.2; 1.33.8;
defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.32 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.31 01-Jul-1999  darrenr add PR_LISTEN to protocols which support listen(2)
 1.30 29-Apr-1999  thorpej Implement retransmit logic for the SYN cache engine. Fixes a rare condition
where one side can think a connection exists, where the other side thinks
the connection was never established.

The original problem was first reported by Ty Sarna in PR #5909. The
original fix I made to the code didn't cover all cases. The problem this
fix addresses was reported by Christoph Badura via private e-mail.

Many thanks to Bill Sommerfeld for helping me to test this code, and
for finding a subtle bug.
 1.29 14-Jan-1999  thorpej branches: 1.29.2;
Domains are associated with protocol families, not address families.
 1.28 11-Jan-1999  thorpej Adjust for the new IP-IP input path.
 1.27 22-Dec-1998  thorpej ipip_input() -> mrt_ipip_input().
 1.26 30-Sep-1998  hwr Start supporting IPPROTO_MOBILE (55) encapsulation. This is yet
another tunneling protocol used by the Mobile-IP people. See RFC 2004
for this.
 1.25 13-Sep-1998  hwr Add a gre tunnel pseudo network device. Gre = generic route encapsulation.
This device shows up like any other network interface and can be used to
tunnel L3 protocols as e.g. IP over IP.
 1.24 15-Jul-1998  thorpej Garbage collect `imp' and `hy'. We don't have the rest of the code, and
it's not like anyone is ever going to be using either of them.
 1.23 05-Jul-1998  jonathan defopt NS, NSIP.
 1.22 05-Jul-1998  jonathan defopt ISO TPIP.
 1.21 05-Jul-1998  jonathan defopt EON.
 1.20 07-May-1998  thorpej Rework the syn cache code somewhat:
- Don't use home-grown queue manipulation. Use <sys/queue.h> instead. The
data structures are a little larger, but we are otherwise wasting the
memory chunk anyway (we're already a 64-byte malloc bucket).
- Fix a bug in the cache-is-full case: if the oldest element removed from
the first non-empty bucket was the only element in the bucket, the
bucket wouldn't be removed from the bucket cache, causing queue corruption
later.
- Optimize the syn cache timers by using PRT timers rather than home-grown
decrement-and-propagate timers.

This code is now a fair bit smaller, and significantly easier to read
and understand.
 1.19 12-Jan-1998  scottr Use option header file for MROUTING
 1.18 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.17 23-Jul-1997  thorpej Pull SYN_cache_branch down into the main line.
 1.16 10-Oct-1996  christos branches: 1.16.8;
- fix NSIP; it referenced non-existing functions.
 1.15 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.14 18-Feb-1996  christos Fix PR/2095 options MROUTING did not compile.
 1.13 13-Feb-1996  christos netinet prototypes
 1.12 30-Sep-1995  thorpej Implement tcp_sysctl(). Add a sysctl option to enable/disable RFC1323
extensions to TCP. From John Kohl <jtk@kolvir.blrc.ma.us>.
 1.11 31-May-1995  mycroft Integrate multicast 3.5 distribution, with several bugs fixed and general
cleanup. This is a (working) snapshot of work in progress.
 1.10 31-May-1995  mycroft Implement IGMP v2. Based on the Multicast 3.5 distribution.
 1.9 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.8 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.7 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.6 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.5 18-Dec-1993  mycroft Canonicalize all #includes.
 1.4 06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 10-Apr-1993  glass fixed to be compliant, subservient, and to take advantage of the newly
hacked config(8)
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.16.8.1 14-May-1997  mellon Add syn_cache variables
 1.29.2.1 29-Apr-1999  perry branches: 1.29.2.1.2; 1.29.2.1.4;
pullup 1.29->1.30 (thorpej)
 1.29.2.1.4.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.29.2.1.4.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.29.2.1.4.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.29.2.1.2.3 02-Aug-1999  thorpej Update from trunk.
 1.29.2.1.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.29.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.33.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.33.2.3 27-Mar-2001  bouyer Sync with HEAD.
 1.33.2.2 12-Mar-2001  bouyer Sync with HEAD.
 1.33.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.39.4.4 20-Apr-2004  jmc Pullup patch (requested by itojun in ticket #143)

If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it. Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window. If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
Respond to RST by ACK, as suggested in NISCC recommendation.
Rate-limit ACKs against RSTs and SYNs.
If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
 1.39.4.3 09-Sep-2003  msaitoh Pull up rev. 1.60 via patch (requested by itojun in ticket #68):
enforce ipsec policy on raw wildcard.
 1.39.4.2 11-Mar-2001  he Pull up revision 1.47 (via patch, requested by itojun):
Ensure that we enforce inbound IPsec policy on all IP protocols,
not just TCP, UDP and ICMP.
 1.39.4.1 16-Aug-2000  itojun pullup (approved by releng-1-5)

switch from net.inet*.*.*ratelimit to net.inet*.*.ppslimit.

(tags are rough estimate - we had some try-and-error in main trunc)
sys/netinet/icmp6.h 1.9 -> 1.11
sys/netinet/icmp_var.h 1.15 -> 1.17
sys/netinet/in_proto.c 1.39 -> 1.42
sys/netinet/ip_icmp.c 1.50 -> 1.51, 1.52 -> 1.54
sys/netinet/tcp_input.c 1.111 -> 1.112, 1.115 -> 1.117
sys/netinet/tcp_usrreq.c 1.52 -> 1.53
sys/netinet/tcp_var.h 1.72 -> 1.75
sys/netinet6/icmp6.c 1.34 -> 1.35, 1.36 -> 1.38
sys/netinet6/in6_proto.c 1.17 -> 1.19
 1.47.2.8 11-Nov-2002  nathanw Catch up to -current
 1.47.2.7 18-Oct-2002  nathanw Catch up to -current.
 1.47.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.47.2.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.47.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.47.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.47.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.47.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.48.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.48.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.48.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.48.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.48.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.48.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.49.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.55.8.1 04-Oct-2003  tron Pull up revision 1.60 (requested by itojun in ticket #1409):
enforce ipsec policy on raw wildcard.
 1.55.6.1 20-Jun-2002  gehenna catch up with -current.
 1.58.6.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.58.6.6 15-Feb-2005  skrll Sync with HEAD.
 1.58.6.5 04-Feb-2005  skrll Sync with HEAD.
 1.58.6.4 24-Jan-2005  skrll Sync with HEAD.
 1.58.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.58.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.58.6.1 03-Aug-2004  skrll Sync with HEAD
 1.65.4.1 29-Apr-2005  kent sync with -current
 1.66.2.1 12-Feb-2005  yamt sync with head.
 1.68.4.1 15-Aug-2005  tron Pull up revision 1.70 (requested by gdt in ticket #661):
Add PR_PURGEIF flag for protocols to indicate that the protocol might
store a struct ifnet *, and define it for udp/tcp/rawip for INET and
INET6. When deleting a struct ifnet, invoke PRU_PURGEIF on all
protocols marked with PR_PURGEIF. Closes PR kern/29580 (mine).
 1.69.2.4 27-Oct-2007  yamt sync with head.
 1.69.2.3 03-Sep-2007  yamt sync with head.
 1.69.2.2 30-Dec-2006  yamt sync with head.
 1.69.2.1 21-Jun-2006  yamt sync with head.
 1.71.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.71.8.3 14-Sep-2006  yamt sync with head.
 1.71.8.2 03-Sep-2006  yamt sync with head.
 1.71.8.1 24-May-2006  yamt sync with head.
 1.71.6.1 01-Jun-2006  kardel Sync with head.
 1.71.4.1 09-Sep-2006  rpaulo sync with head
 1.76.4.2 10-Dec-2006  yamt sync with head.
 1.76.4.1 22-Oct-2006  yamt sync with head
 1.76.2.2 12-Jan-2007  ad Sync with head.
 1.76.2.1 18-Nov-2006  ad Sync with head.
 1.80.2.2 07-May-2007  yamt sync with head.
 1.80.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.83.4.1 11-Jul-2007  mjf Sync with head.
 1.83.2.3 09-Oct-2007  ad Sync with head.
 1.83.2.2 15-Jul-2007  ad Sync with head.
 1.83.2.1 08-Jun-2007  ad Sync with head.
 1.87.8.1 06-Nov-2007  matt sync with HEAD
 1.87.6.3 07-Oct-2007  joerg Sync with HEAD.
 1.87.6.2 02-Oct-2007  joerg Sync with HEAD.
 1.87.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.87.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.90.2.1 06-Oct-2007  yamt sync with head.
 1.91.18.2 17-Jan-2009  mjf Sync with HEAD.
 1.91.18.1 02-Jun-2008  mjf Sync with HEAD.
 1.92.2.1 18-May-2008  yamt sync with head.
 1.94.10.2 03-Mar-2009  skrll Sync with HEAD.
 1.94.10.1 19-Jan-2009  skrll Sync with HEAD.
 1.94.8.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.94.2.3 11-Mar-2010  yamt sync with head
 1.94.2.2 16-Sep-2009  yamt sync with head
 1.94.2.1 04-May-2009  yamt sync with head.
 1.96.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.99.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.99.4.2 31-May-2011  rmind sync with head
 1.99.4.1 21-Apr-2011  rmind sync with head
 1.101.8.2 05-Apr-2012  mrg sync to latest -current.
 1.101.8.1 18-Feb-2012  mrg merge to -current.
 1.101.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.101.4.1 17-Apr-2012  yamt sync with head
 1.103.2.3 03-Dec-2017  jdolecek update from HEAD
 1.103.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.103.2.1 23-Jun-2013  tls resync from head
 1.106.2.3 18-May-2014  rmind sync with head
 1.106.2.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.106.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.108.2.1 10-Aug-2014  tls Rebase.
 1.110.4.8 28-Aug-2017  skrll Sync with HEAD
 1.110.4.7 29-May-2016  skrll Sync with HEAD
 1.110.4.6 22-Apr-2016  skrll Sync with HEAD
 1.110.4.5 19-Mar-2016  skrll Sync with HEAD
 1.110.4.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.110.4.3 22-Sep-2015  skrll Sync with HEAD
 1.110.4.2 06-Jun-2015  skrll Sync with HEAD
 1.110.4.1 06-Apr-2015  skrll Sync with HEAD
 1.120.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.120.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.120.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.123.4.3 31-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #676):

sys/netinet/in_proto.c: revision 1.127
sys/netinet6/in6_proto.c: revision 1.122

Add the PR_LASTHDR flag on the PFsync and CARP entries. Otherwise a
"require" IPsec policy is not enforced on them, and unauthenticated
packets will be accepted.

Tested with a require-AH configuration. Sent on tech-net@, no comment.
 1.123.4.2 24-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #305):
distrib/sets/lists/tests/mi: revision 1.762
sys/net/route.c: revision 1.198-1.201
sys/net/route.h: revision 1.114
sys/netatalk/at_proto.c: revision 1.22
sys/netinet/in_proto.c: revision 1.124
sys/netinet6/in6_proto.c: revision 1.118
sys/netmpls/mpls_proto.c: revision 1.31
sys/netnatm/natm_proto.c: revision 1.18
sys/rump/net/lib/libsockin/sockin.c: revision 1.65
sys/sys/domain.h: revision 1.33
tests/net/route/Makefile: revision 1.6
tests/net/route/t_rtcache.sh: revision 1.1
Add tests of rtcache invalidation
Remove unnecessary NULL check of rt_ifp
It's always non-NULL.
Invalidate rtcache based on a global generation counter
The change introduces a global generation counter that is incremented when any
routes have been added or deleted. When a rtcache caches a rtentry into itself,
it also stores a snapshot of the generation counter. If the snapshot equals to
the global counter, the cache is still valid, otherwise invalidated.
One drawback of the change is that all rtcaches of all protocol families are
invalidated when any routes of any protocol families are added or deleted.
If that matters, we should have separate generation counters based on
protocol families.
This change removes LIST_ENTRY from struct route, which fixes a part of
PR kern/52515.
Remove the global lock for rtcache
Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only by
their users. And in existing usages a rtcache is guranteed to be not accessed
simultaneously. So the rtcache framework doesn't need any exclusion controls
in itself.
Synchronize on rtcache_generation with rtlock
It's racy if NET_MPSAFE is enabled.
Pointed out by joerg@
 1.123.4.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.126.2.4 30-Sep-2018  pgoyette Ssync with HEAD
 1.126.2.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.126.2.2 21-May-2018  pgoyette Sync with HEAD
 1.126.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.128.2.1 10-Jun-2019  christos Sync with HEAD
 1.4 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.3 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.2 10-Dec-2005  elad branches: 1.2.140;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 29-Apr-2005  yamt branches: 1.1.2; 1.1.4; 1.1.10;
move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.
 1.1.10.3 11-Dec-2005  christos Sync with head.
 1.1.10.2 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.10.1 29-Apr-2005  skrll file in_proto.h was added on branch ktrace-lwp on 2005-11-10 14:11:07 +0000
 1.1.4.1 21-Jun-2006  yamt sync with head.
 1.1.2.2 29-Apr-2005  kent sync with -current
 1.1.2.1 29-Apr-2005  kent file in_proto.h was added on branch kent-audio2 on 2005-04-29 11:29:33 +0000
 1.2.140.1 19-Mar-2016  skrll Sync with HEAD
 1.8 02-May-2007  dyoung Remove obsolete files netinet/in_route.[ch].
 1.7 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.6 22-Apr-2007  dyoung In in_rtflushall(), clear the route caches using rtcache_clear()
instead of rtcache_free(). It is not desirable to clear the cached
destination as well as the route, however, rtcache_free() will
eventually release all resources held by the cache, including the
destination.

Add some additional diagnostic assertions.
 1.5 18-Apr-2007  dyoung Add optimization hint for compiler. In a debug printf,
s/freeing/flushing/.
 1.4 17-Feb-2007  dyoung branches: 1.4.4; 1.4.6;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.3 17-Feb-2007  dyoung branches: 1.3.2;
s/in_rtflush/in_rtcache/g
 1.2 05-Jan-2007  joerg branches: 1.2.2;
Use rtcache_free for consistency.
 1.1 09-Dec-2006  dyoung branches: 1.1.2; 1.1.4;
Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.1.4.4 03-Sep-2007  yamt sync with head.
 1.1.4.3 26-Feb-2007  yamt sync with head.
 1.1.4.2 30-Dec-2006  yamt sync with head.
 1.1.4.1 09-Dec-2006  yamt file in_route.c was added on branch yamt-lazymbuf on 2006-12-30 20:50:33 +0000
 1.1.2.2 10-Dec-2006  yamt sync with head.
 1.1.2.1 09-Dec-2006  yamt file in_route.c was added on branch yamt-splraiseipl on 2006-12-10 07:19:10 +0000
 1.2.2.2 12-Jan-2007  ad Sync with head.
 1.2.2.1 05-Jan-2007  ad file in_route.c was added on branch newlock2 on 2007-01-12 01:04:14 +0000
 1.3.2.3 07-May-2007  yamt sync with head.
 1.3.2.2 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.3.2.1 17-Feb-2007  yamt file in_route.c was added on branch yamt-idlelwp on 2007-02-27 16:54:53 +0000
 1.4.6.1 11-Jul-2007  mjf Sync with head.
 1.4.4.1 08-Jun-2007  ad Sync with head.
 1.2 02-May-2007  dyoung Remove obsolete files netinet/in_route.[ch].
 1.1 09-Dec-2006  dyoung branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8; 1.1.12; 1.1.14;
Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.1.14.1 11-Jul-2007  mjf Sync with head.
 1.1.12.2 09-Jun-2007  ad Sync with head.
 1.1.12.1 08-Jun-2007  ad Sync with head.
 1.1.8.1 07-May-2007  yamt sync with head.
 1.1.6.2 12-Jan-2007  ad Sync with head.
 1.1.6.1 09-Dec-2006  ad file in_route.h was added on branch newlock2 on 2007-01-12 01:04:14 +0000
 1.1.4.3 03-Sep-2007  yamt sync with head.
 1.1.4.2 30-Dec-2006  yamt sync with head.
 1.1.4.1 09-Dec-2006  yamt file in_route.h was added on branch yamt-lazymbuf on 2006-12-30 20:50:33 +0000
 1.1.2.2 10-Dec-2006  yamt sync with head.
 1.1.2.1 09-Dec-2006  yamt file in_route.h was added on branch yamt-splraiseipl on 2006-12-10 07:19:10 +0000
 1.17 07-Jul-2016  ozaki-r Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.16 21-Sep-2015  skrll Make this compile again
 1.15 31-Aug-2015  ozaki-r Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.
 1.14 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.13 08-Jun-2015  roy Don't set errno. Thanks to skrll@
 1.12 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.11 25-Feb-2014  pooka branches: 1.11.6;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.10 21-Oct-2013  christos fix type of sysctl, from nisimura@
 1.9 02-Jun-2012  dsl branches: 1.9.2; 1.9.4;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.
 1.8 19-Oct-2009  rmind branches: 1.8.12;
Drop 3rd and 4th clauses from David Young's license.
Reviewed and approved by dyoung@ (copyright holder).
 1.7 30-Aug-2009  dyoung Stop the admin from creating nodes under net.inet.ip.interfaces or
net.inet.ip.interfaces.<ifname>.
 1.6 04-Dec-2007  dyoung branches: 1.6.16; 1.6.26; 1.6.34;
Use IFADDR_FOREACH().
 1.5 22-Feb-2007  dyoung branches: 1.5.16; 1.5.18; 1.5.24; 1.5.26;
Reverse sense of preference numbers: prefer source addresses with
higher preference numbers. Thanks to Mihai Chelaru for pointing
out my mistake.
 1.4 22-Feb-2007  dyoung Add net.inet.ip.selectsrc.default even if GETIFA_DEBUG is not
#define'd.
 1.3 16-Nov-2006  christos branches: 1.3.2; 1.3.4; 1.3.6; 1.3.8; 1.3.10;
__unused removal on arguments; approved by core.
 1.2 13-Nov-2006  dyoung Plug memory leak.
 1.1 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.3.10.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.3.8.4 07-Dec-2007  yamt sync with head
 1.3.8.3 26-Feb-2007  yamt sync with head.
 1.3.8.2 30-Dec-2006  yamt sync with head.
 1.3.8.1 16-Nov-2006  yamt file in_selsrc.c was added on branch yamt-lazymbuf on 2006-12-30 20:50:33 +0000
 1.3.6.2 10-Dec-2006  yamt sync with head.
 1.3.6.1 16-Nov-2006  yamt file in_selsrc.c was added on branch yamt-splraiseipl on 2006-12-10 07:19:10 +0000
 1.3.4.1 04-Mar-2007  bouyer Pull up following revision(s) (requested by dyoung in ticket #481):
sys/netinet/in_selsrc.c: revision 1.4
sys/netinet/in_selsrc.c: revision 1.5
share/man/man9/in_getifa.9: revision 1.3
Add net.inet.ip.selectsrc.default even if GETIFA_DEBUG is not
Reverse sense of preference numbers: prefer source addresses with
higher preference numbers. Thanks to Mihai Chelaru for pointing
out my mistake.
 1.3.2.2 18-Nov-2006  ad Sync with head.
 1.3.2.1 16-Nov-2006  ad file in_selsrc.c was added on branch newlock2 on 2006-11-18 21:39:36 +0000
 1.5.26.1 08-Dec-2007  ad Sync with head.
 1.5.24.1 08-Dec-2007  mjf Sync with HEAD.
 1.5.18.1 09-Jan-2008  matt sync with HEAD
 1.5.16.1 09-Dec-2007  jmcneill Sync with HEAD.
 1.6.34.1 21-Apr-2010  matt sync to netbsd-5
 1.6.26.1 26-Sep-2009  snj Pull up following revision(s) (requested by dyoung in ticket #937):
sys/netinet/in_selsrc.c: revision 1.7
Stop the admin from creating nodes under net.inet.ip.interfaces or
net.inet.ip.interfaces.<ifname>.
 1.6.16.2 11-Mar-2010  yamt sync with head
 1.6.16.1 16-Sep-2009  yamt sync with head
 1.8.12.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.12.1 30-Oct-2012  yamt sync with head
 1.9.4.1 18-May-2014  rmind sync with head
 1.9.2.2 03-Dec-2017  jdolecek update from HEAD
 1.9.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.11.6.3 09-Jul-2016  skrll Sync with HEAD
 1.11.6.2 22-Sep-2015  skrll Sync with HEAD
 1.11.6.1 06-Jun-2015  skrll Sync with HEAD
 1.2 31-Aug-2015  ozaki-r Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.
 1.1 13-Nov-2006  dyoung branches: 1.1.2; 1.1.6; 1.1.8; 1.1.104; 1.1.124;
Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.1.124.1 22-Sep-2015  skrll Sync with HEAD
 1.1.104.1 03-Dec-2017  jdolecek update from HEAD
 1.1.8.2 30-Dec-2006  yamt sync with head.
 1.1.8.1 13-Nov-2006  yamt file in_selsrc.h was added on branch yamt-lazymbuf on 2006-12-30 20:50:33 +0000
 1.1.6.2 10-Dec-2006  yamt sync with head.
 1.1.6.1 13-Nov-2006  yamt file in_selsrc.h was added on branch yamt-splraiseipl on 2006-12-10 07:19:10 +0000
 1.1.2.2 18-Nov-2006  ad Sync with head.
 1.1.2.1 13-Nov-2006  ad file in_selsrc.h was added on branch newlock2 on 2006-11-18 21:39:36 +0000
 1.14 28-Aug-2020  riastradh netinet: Include the needful so include order doesn't matter.
 1.13 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.12 18-Apr-2004  matt branches: 1.12.12;
De __P()
 1.11 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.10 10-Feb-1998  perry branches: 1.10.48;
add/cleanup multiple inclusion protection.
 1.9 07-Jul-1997  phil Protect against double inclusion. PR 3524.
 1.8 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.7 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 08-Jan-1994  mycroft More prototypes.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.10.48.4 11-Dec-2005  christos Sync with head.
 1.10.48.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.10.48.2 18-Sep-2004  skrll Sync with HEAD.
 1.10.48.1 03-Aug-2004  skrll Sync with HEAD
 1.12.12.1 21-Jun-2006  yamt sync with head.
 1.105 11-Jun-2025  ozaki-r in: get rid of unused argument from ip_newid() and ip_newid_range()
 1.104 05-Jun-2025  ozaki-r Apply if_first_addr() and if_first_addr_psref()
 1.103 19-Nov-2022  yamt branches: 1.103.2; 1.103.8;
Make arp have its own mowner

This helped me to debug mbuf leaks in arp.
(if_arp.c rev. 1.298)
 1.102 08-Mar-2021  christos mv <sys/cprng.h> include to the kernel portion
 1.101 08-Mar-2021  christos reinstate a simple version of ip_randomid()
 1.100 08-Mar-2021  christos remove now unused pseudo-random ip id code.
 1.99 08-Mar-2021  christos Use a random IPv4 ID because the shuffling algorithm used before could expose
information (Amit Klein)
 1.98 11-Sep-2020  roy branches: 1.98.2;
inet: Add SIOCGNBRINFO to retrieve neighbor state about an address
 1.97 29-Nov-2018  ozaki-r branches: 1.97.4;
Introduce and use ip_dad_enabled() and ip6_dad_enabled() functions
 1.96 19-Apr-2018  christos branches: 1.96.2;
s/static inline/static __inline/g for consistency.
 1.95 12-May-2017  ryo branches: 1.95.2; 1.95.8;
replace in_fmtaddr() by IN_PRINT(), and delete function in_fmtaddr()
 1.94 16-Jan-2017  christos branches: 1.94.4;
really, use.
 1.93 16-Jan-2017  christos rename arplog -> ARPLOG to make it clear that it is a macro and tuck-in the
buffer used for address formatting.
 1.92 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.91 02-Jan-2017  christos branches: 1.91.2;
- You can't just call the pfil hook to remove an address before an address
is removed! Hold a reference instead, remove it, and then free it.
- GC iatoifa()
 1.90 06-Dec-2016  knakahara add API to manipulate ifa->ia_hash and ia_hash_pslist_entry, and fix ia_hash_pslist_entry race by using them.

in_ifaddr_lock is required before writing ifa->ia_hash and
ia_hash_pslist_entry to serialize writer processings.

reviewed by ozaki-r@n.o.
 1.89 18-Nov-2016  knakahara We must use PSLIST_ENTRY_DESTROY after PSLIST_WRITER_REMOVE and waiting all readers done.

And then, if we want to re-insert the removed pslist element, we need to
call PSLIST_ENTERY_INIT again.

advised by riastradh@n.o and reviewed by ozaki-r@n.o, thanks.
 1.88 11-Oct-2016  roy Implement RFC 5227 2.4 Ongoing Conflict Detection and Address Defence.

If ip_dad_count is 0, then the conflict is just logged and the address
is not marked as duplicated.
 1.87 29-Sep-2016  roy in_ifscrub is no longer needed.
 1.86 29-Sep-2016  roy Set dstaddr in in_ifinit so that sppp consumers announce the correct
dstaddr in routing messages.
 1.85 18-Sep-2016  christos Dealing with arplog is a bit more complicated...
 1.84 17-Sep-2016  christos protect arplog with INET
 1.83 16-Sep-2016  roy Drop hostIsNew from in_ifinit, let the function work out if the address
has changed.
Sync address flag setup with the IPv6 counterpart.
When scrubbing the address, or setting up the address fails, restore the
old address flags as well as the old address.
 1.82 15-Sep-2016  roy Allow arplog to be used outside of if_arp.c
 1.81 13-Sep-2016  christos remove trailing spaces. userland does not catch this?
 1.80 13-Sep-2016  christos add bits for address flags
 1.79 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.78 08-Jul-2016  ozaki-r branches: 1.78.2;
Replace macros to get an IP address with proper inline functions

The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
 1.77 08-Jul-2016  ozaki-r Kill remaining use of the old lists of IP addresses
 1.76 06-Jul-2016  ozaki-r Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.
 1.75 06-Jul-2016  ozaki-r Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.
 1.74 31-Aug-2015  ozaki-r Replace ARP cache (llinfo) with lltable/llentry

Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
- ARP specific data are stored in the hashed list
of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
- the global timer callout with the big locks can be
removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
- it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
- it was a parameter that prevents expiration of active caches
- Removed to simplify the timer logic, but we may be able to
restore the feature if really needed

Proposed on tech-kern and tech-net.
 1.73 31-Aug-2015  ozaki-r Hook up lltable/llentry with the kernel (and rumpkernel)

It is built and initialized on bootup, but there is no user for now.

Most codes in in.c are imported from FreeBSD as well as lltable/llentry.
 1.72 16-May-2015  roy Separate ARP handling DAD from inet.
This is done by signalling the intent to try tentative addresses
and then clearing the intent once the address is setup.
When the ARP handler is installed (arp_ifinit) then it adds
dad start and stop functions to the address which are used instead
of calling ARP directly.
 1.71 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.70 01-Jul-2014  rtr branches: 1.70.4;
fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.69 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.68 29-May-2014  rmind Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.
 1.67 23-May-2014  rmind Make ip_input() static, there is no need to expose it.
 1.66 22-May-2014  rmind - Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.65 05-Nov-2010  rmind branches: 1.65.18; 1.65.22; 1.65.32;
ip_randomid: make mechanism MP-safe and more modular.

OK matt@
 1.64 19-Jul-2010  rmind Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@
 1.63 13-Jul-2010  rmind Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@
 1.62 28-Apr-2008  martin branches: 1.62.20; 1.62.22;
Remove clause 3 and 4 from TNF licenses
 1.61 06-Feb-2008  matt branches: 1.61.6; 1.61.8; 1.61.10;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.
 1.60 05-Dec-2007  dyoung Extract common code, creating a subroutine if_purgeaddrs(ifp,
family, purgeaddr) which applies function `purgeaddr' to each
address on `ifp' belonging to `family'.
 1.59 01-Sep-2007  dyoung branches: 1.59.6; 1.59.8;
Use ifreq_setaddr(), ifreq_getaddr(), sockaddr_in_init(), and
sockaddr_copy(). Constify. Compare pointers with NULL, not 0.
Don't "test truth" of pointers, but compare with NULL.
 1.58 04-Mar-2007  christos branches: 1.58.2; 1.58.10; 1.58.14; 1.58.16;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.57 23-Jul-2006  ad branches: 1.57.10;
Use the LWP cached credentials where sane.
 1.56 10-Dec-2005  elad branches: 1.56.4; 1.56.8;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.55 09-Mar-2005  atatat branches: 1.55.4;
Add the following nodes to the sysctl tree:

net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist

which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
 1.54 24-Jan-2005  matt branches: 1.54.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.53 21-Apr-2004  itojun branches: 1.53.4;
no space between function name and paren: foo (blah) -> foo(blah)
 1.52 18-Apr-2004  matt De __P()
 1.51 11-Nov-2003  jonathan Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.
 1.50 23-Oct-2003  mycroft Remove all the code to maintain ia_inpcbs. This information was only used to
close sockets on address changes, which was deemed to be a bad idea and was
summarily removed, so there is no point in wasting effort on maintaining it
any more.
 1.49 18-Aug-2003  itojun since we cope with packets with addess on !IFF_UP interface in ip_input()
properly, IFF_UP check in INADDR_TO_IA is obsolete (or too much).
 1.48 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.47 14-Jul-2003  itojun correct igmp. from love
 1.46 26-Jun-2003  itojun branches: 1.46.2;
tabify
 1.45 15-Jun-2003  matt Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.44 12-May-2002  matt Eliminate commons.
 1.43 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.42 04-Nov-2001  matt Change a few variable/tables to const since they are read-only.
 1.41 08-Oct-2000  enami branches: 1.41.2; 1.41.4; 1.41.8;
- Keep track of allhost multicast address record we joined into
each in_ifaddr and delete it when an address is purged.
- Don't simply try to delete a multicast address record listed in the
ia_multiaddrs. It results a dangling pointer. Let who holds a
reference to it to delete it.
 1.40 08-Oct-2000  itojun implement multicast kludge table for IPv4.
- when all the interface address is removed from an interface, and there's
multicast groups still left joined, keep it in kludge table.
- when an interface address is added again, recover multicast groups from
kludge table.
this will avoid problem with dangling in_ifaddr on pcmcia card removal,
due to the link from multicast group info (in_multi).

the code is basically from sys/netinet6/in6.c (jinmei@kame).

pointed out by: Shiva Shenoy <shiva_s@yahoo.com>
 1.39 30-Mar-2000  augustss branches: 1.39.4;
Remove register declarations.
 1.38 30-Mar-2000  simonb Delete redundant decl of in_socktrim() - it's in <netinet/in.h>.
 1.37 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.36 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.35 01-Jul-1999  itojun branches: 1.35.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.34 16-May-1999  thorpej Sigh, fix some broken logic in the last change to INADDR_TO_IA(), and make
the macro a little more obvious. Should fix kern/7589, from Jens A Nilsson.
 1.33 03-May-1999  thorpej In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.
 1.32 19-Dec-1998  thorpej branches: 1.32.2;
Reverse the copyright-notice-swap. It went against existing practice.
 1.31 30-Sep-1998  tls branches: 1.31.4;
Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.
 1.30 14-Aug-1998  scottr Fix the NEXT_IA_WITH_SAME_ADDR macro introduced in 1.27: it was finding
the first in_ifaddr structure with a different internet address! Reverse
the sense of the test. Spotted by and fix from Eric Haszlakiewicz.
 1.29 29-Jul-1998  tls change IN_IFADDR_HASH_SIZE to 509, which actually uses no more space than 293 due to rounding up to nearest power of two in hashinit.
 1.28 16-Jul-1998  tls Put original hash function back. It wastes a little bit of space, but is much more even -- think of the case of a web service provider, some of whose customers end up getting 'inferior service' because they're on addresses that happen to be out at the end of a hash chain. With webservers with thousands of addresses, this is a real issue. If the wasted space is a big deal, we could pick a prime number that's slightly _less_ than a power of two...
 1.27 02-Jul-1998  is The rewrite of if_arp.c to work with the hashed interface address lists
(1.44) missed a test for the right interface, making some machines answer
to some bogus arp requests (like for WHO-HAS 127.0.0.1).

The quick patch in 1.46-1.47 does not work for so-called "unnumbered"
interfaces, that is, (point-to-point) interfaces that share their local
address with another (e.g., the Ethernet) interface.

We add a macro to in_var.h, to step (in the current implementation) through
the hash chain and fine more entries with the same address, and use that
in if_arp.c to find one which belongs to our interface.
 1.26 01-Jun-1998  thorpej Eek, we were wasting almost half of the in_ifaddr hash space by modulo'ing
with IN_IFADDR_HASH_SIZE. Instead, AND with the hash mask computed by
hashinit().
 1.25 29-May-1998  matt Change arp so its console log messages print out IP addresses in
dotted quad format instead of hex.
 1.24 04-May-1998  matt Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.
 1.23 29-Apr-1998  matt Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).
 1.22 15-Feb-1998  tls Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.
 1.21 13-Feb-1998  tls Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.
 1.20 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.19 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.18 23-Jul-1997  thorpej branches: 1.18.6;
Pull SYN_cache_branch down into the main line.
 1.17 22-May-1996  mycroft branches: 1.17.8;
Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.16 13-Feb-1996  christos branches: 1.16.4;
netinet prototypes
 1.15 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.14 04-Jun-1995  mycroft Don't cast things unnecessarily.
 1.13 01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.12 31-May-1995  mycroft Implement IGMP v2. Based on the Multicast 3.5 distribution.
 1.11 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.10 29-Mar-1995  briggs KERNEL -> _KERNEL
 1.9 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.8 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.6 09-Jan-1994  mycroft Prototype the rest.
 1.5 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.4 08-Dec-1993  hpeyerl more Multicast stuff.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.16.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.17.8.1 14-May-1997  mellon Add prototype for in_setmaxmtu()
 1.18.6.1 01-Oct-1998  cgd pull up revisions 1.21-1.22, 1.27, 1.29-1.30, 1.31 (via patch) from
trunk. (tls)
 1.31.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.32.2.2 07-Jun-1999  perry pullup 1.33->1.34 (thorpej): fix INADDR_TO_IA()
 1.32.2.1 03-May-1999  perry branches: 1.32.2.1.2; 1.32.2.1.4;
pullup 1.32->1.33 (thorpej)
 1.32.2.1.4.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.32.2.1.4.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.32.2.1.4.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.32.2.1.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.32.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.35.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.39.4.1 17-Oct-2000  tv Pullup 1.41 [enami]:
- Keep track of allhost multicast address record we joined into
each in_ifaddr and delete it when an address is purged.
- Don't simply try to delete a multicast address record listed in the
ia_multiaddrs. It results a dangling pointer. Let who holds a
reference to it to delete it.

Also 1.40 [itojun, req by enami]:
implement multicast kludge table for IPv4.
- when all the interface address is removed from an interface, and there's
multicast groups still left joined, keep it in kludge table.
- when an interface address is added again, recover multicast groups from
kludge table.
this will avoid problem with dangling in_ifaddr on pcmcia card removal,
due to the link from multicast group info (in_multi).
 1.41.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.41.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.41.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.41.2.2 20-Jun-2002  nathanw Catch up to -current.
 1.41.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.46.2.6 11-Dec-2005  christos Sync with head.
 1.46.2.5 01-Apr-2005  skrll Sync with HEAD.
 1.46.2.4 04-Feb-2005  skrll Sync with HEAD.
 1.46.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.46.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.46.2.1 03-Aug-2004  skrll Sync with HEAD
 1.53.4.1 29-Apr-2005  kent sync with -current
 1.54.2.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.55.4.5 11-Feb-2008  yamt sync with head.
 1.55.4.4 07-Dec-2007  yamt sync with head
 1.55.4.3 03-Sep-2007  yamt sync with head.
 1.55.4.2 30-Dec-2006  yamt sync with head.
 1.55.4.1 21-Jun-2006  yamt sync with head.
 1.56.8.1 11-Aug-2006  yamt sync with head
 1.56.4.1 09-Sep-2006  rpaulo sync with head
 1.57.10.1 12-Mar-2007  rmind Sync with HEAD.
 1.58.16.3 23-Mar-2008  matt sync with HEAD
 1.58.16.2 09-Jan-2008  matt sync with HEAD
 1.58.16.1 06-Nov-2007  matt sync with HEAD
 1.58.14.2 09-Dec-2007  jmcneill Sync with HEAD.
 1.58.14.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.58.10.1 03-Sep-2007  skrll Sync with HEAD.
 1.58.2.1 09-Oct-2007  ad Sync with head.
 1.59.8.1 08-Dec-2007  ad Sync with head.
 1.59.6.2 18-Feb-2008  mjf Sync with HEAD.
 1.59.6.1 08-Dec-2007  mjf Sync with HEAD.
 1.61.10.1 16-May-2008  yamt sync with head.
 1.61.8.1 18-May-2008  yamt sync with head.
 1.61.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.62.22.1 05-Mar-2011  rmind sync with head
 1.62.20.1 06-Nov-2010  uebayasi Sync with HEAD.
 1.65.32.1 10-Aug-2014  tls Rebase.
 1.65.22.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.65.18.2 03-Dec-2017  jdolecek update from HEAD
 1.65.18.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.70.4.7 28-Aug-2017  skrll Sync with HEAD
 1.70.4.6 05-Feb-2017  skrll Sync with HEAD
 1.70.4.5 05-Dec-2016  skrll Sync with HEAD
 1.70.4.4 05-Oct-2016  skrll Sync with HEAD
 1.70.4.3 09-Jul-2016  skrll Sync with HEAD
 1.70.4.2 22-Sep-2015  skrll Sync with HEAD
 1.70.4.1 06-Jun-2015  skrll Sync with HEAD
 1.78.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.78.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.78.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.78.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.91.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.94.4.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.95.8.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.95.8.1 22-Apr-2018  pgoyette Sync with HEAD
 1.95.2.1 09-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1662):

sys/netinet/tcp_subr.c: revision 1.286
sys/netinet/tcp_timer.c: revision 1.96
sys/netinet/in_var.h: revision 1.102
sys/netinet/in_var.h: revision 1.99

Don't increment the iss sequence on each connection because it exposes
information (Amit Klein)

Add some randomness to the iss offset

Use a random IPv4 ID because the shuffling algorithm used before could expose
information (Amit Klein)

mv <sys/cprng.h> include to the kernel portion
 1.96.2.1 10-Jun-2019  christos Sync with HEAD
 1.97.4.1 09-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1229):

sys/netinet/tcp_subr.c: revision 1.286
sys/netinet/tcp_timer.c: revision 1.96
sys/netinet/in_var.h: revision 1.102
sys/netinet/in_var.h: revision 1.99

Don't increment the iss sequence on each connection because it exposes
information (Amit Klein)

Add some randomness to the iss offset

Use a random IPv4 ID because the shuffling algorithm used before could expose
information (Amit Klein)

mv <sys/cprng.h> include to the kernel portion
 1.98.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.103.8.1 02-Aug-2025  perseant Sync with HEAD
 1.103.2.2 01-Oct-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1164):

sys/net/link_proto.c: revision 1.41
sys/netinet6/in6.c: revision 1.293
sys/net/if.h: revision 1.307
sys/netinet/ip_icmp.c: revision 1.180
sys/dev/vmt/vmt_subr.c: revision 1.11
sys/netinet6/in6_var.h: revision 1.105
sys/netinet6/in6_var.h: revision 1.106
sys/net/if.c: revision 1.532
sys/net/if.c: revision 1.533
sys/netinet6/mld6.c: revision 1.102
sys/netinet/in_var.h: revision 1.104
sys/net/if_spppsubr.c: revision 1.270
sys/net/if_spppsubr.c: revision 1.271
sys/netinet6/nd6.c: revision 1.284

if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
Apply if_first_addr() and if_first_addr_psref()

in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.103.2.1 29-Jul-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1140):

sys/netinet/ip_output.c: revision 1.330
sys/netinet/sctp_output.c: revision 1.39
sys/netinet/ip_mroute.c: revision 1.166
sys/netipsec/ipsecif.c: revision 1.24
sys/netipsec/xform_ipip.c: revision 1.80
sys/netinet/ip_output.c: revision 1.327
sys/netinet/ip_output.c: revision 1.328
sys/netinet/ip_input.c: revision 1.406
sys/netinet/ip_output.c: revision 1.329
sys/netinet/in_var.h: revision 1.105

in: get rid of unused argument from ip_newid() and ip_newid_range()

in: take a reference of ifp on IP_ROUTETOIF
The ifp could be released after ia4_release(ia).

in: narrow the scope of ifa in ip_output (NFC)

sctp: follow the recent change of ip_newid()

in: avoid racy ifa_acquire(rt->rt_ifa) in ip_output()
If a rtentry is being destroyed asynchronously, ifa referenced by rt_ifa
can be destructed and taking ifa_acquire(rt->rt_ifa) aborts with a
KASSERT failure. Fortunately, the ifa is not actually freed because of
a reference by rt_ifa, it can be available (except some functions like
psref) so as long the rtentry is held.
PR kern/59527

in: avoid racy ia4_acquire(ifatoia(rt->rt_ifa) in ip_rtaddr()
Same as the case of ip_output(), it's racy and should be avoided.
PR kern/59527
 1.39 17-Apr-2022  andvar fix various typos in comments.
 1.38 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.37 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.36 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.35 28-Aug-2020  riastradh branches: 1.35.2;
netinet: Include the needful so include order doesn't matter.
 1.34 02-Nov-2012  christos fix typo
 1.33 02-Nov-2012  christos make this standalone, like every others (except OpenBSD)
 1.32 24-Jul-2011  christos branches: 1.32.2; 1.32.8; 1.32.12; 1.32.14;
Fill in missing IPTOS defines (from Linux/OpenBSD)
 1.31 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.30 21-Dec-2007  matt Add fix for ip_id information leakage. Since the leakage information is
primarily used with TCP SYN and RST packets and such packets are less than
the smallest sized packet that an IP stack is allowed to fragment, we simply
set ip_id to 0 for all packets 68 bytes or less.
 1.29 17-Dec-2006  christos branches: 1.29.20; 1.29.26; 1.29.28; 1.29.32;
According to ANSI c the only portably defined bitfields are unsigned int ones.
 1.28 05-Sep-2006  rpaulo branches: 1.28.2; 1.28.4;
Import of TCP ECN algorithm for congestion control.
Both available for IPv4 and IPv6.
Basic implementation test results are available at
http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.

Work sponsored by the Google Summer of Code project 2006.
Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their
help, comments and support during the project.
 1.27 10-Dec-2005  elad branches: 1.27.4; 1.27.8;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.26 25-Apr-2004  jonathan branches: 1.26.12;
Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.25 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.24 01-Apr-2003  dogcow branches: 1.24.2;
bring into conformance with RFC 3514
 1.23 05-Mar-2002  itojun bring in latest ALTQ from kjc. ALTQify some of the drivers.
 1.22 24-Oct-2001  itojun it may fix PR14124.
 1.21 02-May-2000  sommerfeld branches: 1.21.6; 1.21.8; 1.21.12;
One more __attribute__((__packed__)) to dissuade egcs from making
unwarranted asumptions about the structure's alignment.
 1.20 20-Nov-1999  thorpej Add the `packed' attribute to structures which describe wire protocol data.
 1.19 01-Jul-1999  itojun branches: 1.19.2; 1.19.8;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.18 10-Feb-1998  perry branches: 1.18.8; 1.18.10; 1.18.12;
add/cleanup multiple inclusion protection.
 1.17 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.16 11-Dec-1996  mycroft Minor change to a comment.
 1.15 25-Oct-1996  thorpej Make length and offset fields unsigned. From Kevin M. Lahey <kml@nas.nasa.gov>
 1.14 21-Sep-1996  perry commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.
 1.13 14-Sep-1996  mrg move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.
 1.12 12-Sep-1996  mrg forward decl. struct mbuf (for now).
 1.11 12-Sep-1996  explorer Move an #ifdef _KERNEL up above all the packet filter stuff. This
could very well break the packet filter stuff, but it will make things
like rcp.c compile, and rcp.c should not need to include sys/mbuf.h
to do so...
 1.10 06-Sep-1996  mrg add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.
 1.9 15-May-1995  cgd branches: 1.9.6;
"routine" precedence has a value of 0.
 1.8 17-Apr-1995  cgd spacing cleaup. also, minor type mixup fixups.
 1.7 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.9.6.2 11-Dec-1996  mycroft From trunk:
Ignore the reserved fragment flag when checking ip_off.
 1.9.6.1 10-Nov-1996  thorpej Update from trunk:
- Make ip_len and ip_off unsigned.
- Make sure we don't accept or transmit packets larger than the
maximim IP packet size.
This fixes the so-called `death ping' bug.

Sum of work from Bill Fenner <fenner@parc.xerox.com>,
Kevin Lahey <kml@nas.nasa.gov>, and myself.

Thanks to Curt Sampson, Jukka Marin, and Kevin Lahey for testing
this under NetBSD 1.2
 1.18.12.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.18.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.18.10.1 01-Jul-1999  thorpej Sync w/ -current.
 1.18.8.1 05-May-2000  cgd Pull up revisions 1.20-1.21 (requested by sommerfeld):
Add "__attribute__((__packed__))" to structures used to describe
on-the-wire data, to prevent egcs from making unwarranted assumptions
about the alignment of these structures.
 1.19.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.19.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.21.12.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.21.8.2 16-Mar-2002  jdolecek Catch up with -current.
 1.21.8.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.21.6.2 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.21.6.1 14-Nov-2001  nathanw Catch up to -current.
 1.24.2.4 11-Dec-2005  christos Sync with head.
 1.24.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.24.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.24.2.1 03-Aug-2004  skrll Sync with HEAD
 1.26.12.3 21-Jan-2008  yamt sync with head
 1.26.12.2 30-Dec-2006  yamt sync with head.
 1.26.12.1 21-Jun-2006  yamt sync with head.
 1.27.8.1 14-Sep-2006  yamt sync with head.
 1.27.4.1 09-Sep-2006  rpaulo sync with head
 1.28.4.1 18-Dec-2006  yamt sync with head.
 1.28.2.1 12-Jan-2007  ad Sync with head.
 1.29.32.1 02-Jan-2008  bouyer Sync with HEAD
 1.29.28.1 26-Dec-2007  ad Sync with head.
 1.29.26.1 18-Feb-2008  mjf Sync with HEAD.
 1.29.20.1 09-Jan-2008  matt sync with HEAD
 1.32.14.1 17-Aug-2017  martin Pull up following revision(s) (requested by mrg in ticket #721):
include/resolv.h: revision 1.40
sys/netinet/ip.h: revision 1.33-1.34
fix typo
make this standalone, like every others (except OpenBSD)
add <netinet/in.h> because it is needed for sockaddr_in.
 1.32.12.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.32.8.1 28-Nov-2012  riz Pull up following revision(s) (requested by christos in ticket #721):
include/resolv.h: revision 1.40
sys/netinet/ip.h: revision 1.33
sys/netinet/ip.h: revision 1.34
fix typo
make this standalone, like every others (except OpenBSD)
add <netinet/in.h> because it is needed for sockaddr_in.
 1.32.2.1 16-Jan-2013  yamt sync with (a bit old) head
 1.35.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.30 07-Mar-2021  christos netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)
 1.29 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.28 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.27 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.26 27-Jul-2020  roy branches: 1.26.2;
ip6: Remove __packed attribute from ip6 structures

They should naturally align.
Add compile time assertations to ip6_input.c to prove this.
 1.25 18-May-2018  maxv branches: 1.25.6;
IP6_EXTHDR_GET performs a basic mbuf operation, which has nothing to do
with IPv6. So declare an IP-independent M_REGION_GET, and make
IP6_EXTHDR_GET an alias to it.
 1.24 18-May-2018  maxv Remove IP6_EXTHDR_GET0, remove pointless XXXs, and style.
 1.23 25-Dec-2007  perry branches: 1.23.2; 1.23.90; 1.23.96;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.22 04-Mar-2007  christos branches: 1.22.16; 1.22.22; 1.22.24; 1.22.28;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.21 05-May-2006  rpaulo branches: 1.21.14;
Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.20 10-Dec-2005  elad branches: 1.20.4; 1.20.6; 1.20.8; 1.20.10; 1.20.12;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.19 29-May-2005  christos branches: 1.19.2;
- add const
- remove bogus casts
- avoid nested variables
 1.18 09-Jul-2004  itojun typo. Bruno Rohee
 1.17 26-Apr-2004  itojun declare ip6_hdr_pseudo (for kernel only) and use it for TCP MD5 signature
 1.16 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.15 06-Jun-2003  itojun branches: 1.15.2;
- sync up MLD declaration with RFC3542 (s/MLD6/MLD/)
- routing header declaration with RFC3542
(note: sizeof(ip6_rthdr0) has changed!)
also, sync up with RFC2460 routing header definition (no "strict" source
routing mode any more)

part of advanced API update (RFC2292 -> 3542).
 1.14 14-May-2003  itojun always use PULLDOWN_TEST codepath.
 1.13 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.12 05-Jul-2001  itojun IP6_EXTHDR_GET0 had no check against m->m_len (noone was using this macro).
sync with kame
 1.11 23-Jan-2001  itojun branches: 1.11.2;
put attribute(packed) for ip6 option headers. they will appear at
strange alignment positions. sync with kame
 1.10 10-Oct-2000  itojun sync with kame ($KAME$)
 1.9 13-Jul-2000  itojun remove m_pulldown statistics code. it is highly experimental and belong
to kame tree only (not for *bsd).
 1.8 02-Jul-2000  itojun typo in previous
 1.7 02-Jul-2000  itojun do not touch struct ip6stat on non-INET6 compilation.
From: Paul Goyette <paul@whooppee.com>
 1.6 03-Mar-2000  itojun branches: 1.6.4;
comment fix, sync with kame.
 1.5 24-Feb-2000  itojun hide declaration of IP6_EXTHDR_{GET,CHECK} from userland.
 1.4 06-Feb-2000  itojun to be more rfc2292 complient, move ip6.h and icmp6.h into netinet.
(netinet6/{ip6,icmp6}.h is non-standard path - these files should go away)

it was not possible to use cvsmove in this case.
when you try to look at history, chase it toward netinet6/{ip6,icmp6}.h.
 1.3 03-Jul-1999  thorpej branches: 1.3.2;
RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ip6.h was initially added on branch kame.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ip6.h was added on branch chs-ubc2 on 1999-07-01 23:47:01 +0000
 1.3.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.3.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.6.4.3 14-Jul-2000  itojun pullup (approved by releng-1-5)

remove m_pulldown statistics code. it is highly experimental and belong
to kame tree only (not for *bsd).

1.4 -> 1.5 syssrc/sys/kern/uipc_mbuf2.c
1.8 -> 1.9 syssrc/sys/netinet/ip6.h
1.13 -> 1.14 syssrc/sys/netinet6/ip6_var.h
 1.6.4.2 03-Jul-2000  thorpej Pull up rev. 1.8:
typo in previous
 1.6.4.1 03-Jul-2000  thorpej Pull up rev. 1.7:
do not touch struct ip6stat on non-INET6 compilation.
From: Paul Goyette <paul@whooppee.com>
 1.11.2.2 11-Nov-2002  nathanw Catch up to -current
 1.11.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.15.2.5 11-Dec-2005  christos Sync with head.
 1.15.2.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.15.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.15.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.15.2.1 03-Aug-2004  skrll Sync with HEAD
 1.19.2.3 21-Jan-2008  yamt sync with head
 1.19.2.2 03-Sep-2007  yamt sync with head.
 1.19.2.1 21-Jun-2006  yamt sync with head.
 1.20.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.20.10.1 11-May-2006  elad sync with head
 1.20.8.1 24-May-2006  yamt sync with head.
 1.20.6.1 01-Jun-2006  kardel Sync with head.
 1.20.4.1 09-Sep-2006  rpaulo sync with head
 1.21.14.1 12-Mar-2007  rmind Sync with HEAD.
 1.22.28.1 02-Jan-2008  bouyer Sync with HEAD
 1.22.24.1 26-Dec-2007  ad Sync with head.
 1.22.22.1 18-Feb-2008  mjf Sync with HEAD.
 1.22.16.1 09-Jan-2008  matt sync with HEAD
 1.23.96.1 21-May-2018  pgoyette Sync with HEAD
 1.23.90.1 07-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1661):

sys/netinet6/ip6_id.c: revision 1.19-1.21
sys/netinet6/ip6_var.h: revision 1.88
sys/netinet/ip_input.c: revision 1.400
sys/netinet/tcp_subr.c: revision 1.285
sys/netinet/ip6.h: revision 1.30

netinet: Enable random IP fragment ids by default (from riastradh)

netinet: Enable RFC 1948 pseudorandom TCP ISS selection by default.
(from riastradh)

netinet6: Mark randomid unused.

Will make merging and bisection easier if anything goes wrong with
flow label or fragment id randomization changes.
(from riastradh)

netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)

Replace randomid() by cprng_fast32()
 1.23.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.25.6.1 07-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1226):

sys/netinet6/ip6_id.c: revision 1.19-1.21
sys/netinet6/ip6_var.h: revision 1.88
sys/netinet/ip_input.c: revision 1.400
sys/netinet/tcp_subr.c: revision 1.285
sys/netinet/ip6.h: revision 1.30

netinet: Enable random IP fragment ids by default (from riastradh)

netinet: Enable RFC 1948 pseudorandom TCP ISS selection by default.
(from riastradh)

netinet6: Mark randomid unused.

Will make merging and bisection easier if anything goes wrong with
flow label or fragment id randomization changes.
(from riastradh)

netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)

Replace randomid() by cprng_fast32()
 1.26.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.1 22-Feb-2008  keiichi branches: 1.1.2;
file ip6mh.h was initially added on branch keiichi-mipv6.
 1.1.2.1 22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.35 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.34 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.33 28-Mar-2004  martti branches: 1.33.2;
Upgraded IPFilter to 4.1.1
 1.32 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.31 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.30 19-Sep-2002  martti branches: 1.30.6;
Resync with official IPF
 1.29 19-Sep-2002  martti Upgraded IPFilter to 3.4.29
 1.28 09-Jun-2002  itojun whitespace
 1.27 02-May-2002  martti branches: 1.27.2; 1.27.4;
Fix compilation problems
 1.26 02-May-2002  martti Upgraded IPFilter to 3.4.27
 1.25 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.24 24-Jan-2002  martti Re-sync with IPFilter
 1.23 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.22 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.21 13-Nov-2001  lukem add RCSIDs
 1.20 28-Sep-2001  chs don't depend on other headers to include sys/proc.h for us.
 1.19 26-Mar-2001  mike branches: 1.19.2; 1.19.4;
Resolve conflicts.
 1.18 09-Aug-2000  veego branches: 1.18.2;
Resolve conflicts.
 1.17 23-May-2000  veego branches: 1.17.4;
Resolve conflicts.
 1.16 03-May-2000  veego Resolve conflicts.
 1.15 16-Apr-2000  chs remove ifdefs to skip htons() on some big-endian platforms.
 1.14 30-Mar-2000  augustss Remove register declarations.
 1.13 01-Feb-2000  veego Resolve conflicts.
 1.12 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.11 02-Feb-1999  cjs branches: 1.11.2; 1.11.6; 1.11.8; 1.11.14;
Remove SCCS markers and make these compile in $NetBSD$ IDs.
 1.10 19-Jan-1999  mycroft There's just no plausible reason to byte-swap ip_id internally. It's opaque.
 1.9 22-Nov-1998  mrg merge ipf 3.2.10
 1.8 12-Jul-1998  veego Resolve conflicts from the import.
 1.7 17-May-1998  veego Resolve conflicts
 1.6 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.5 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.4 21-Sep-1997  veego branches: 1.4.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.3 07-Jul-1997  fvdl branches: 1.3.2;
Get rid of (void) cast to KFREE, as it may be a macro, in which case
the cast will be a syntax error.
 1.2 06-Jul-1997  thorpej The sheer number of #ifdef's around it should have been a hint that
#include <machine/mtpr.h> isn't something you're supposed to do in
NetBSD.
 1.1 06-Jul-1997  thorpej branches: 1.1.1;
Initial revision
 1.1.1.19 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.18 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.17 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.16 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.15 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.14 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.13 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.12 09-Aug-2000  veego Import IP Filter 3.4.9
 1.1.1.11 23-May-2000  veego Import IP Filter 3.4.4
 1.1.1.10 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.9 01-Feb-2000  veego Import IP Filter 3.3.8
 1.1.1.8 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.7 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.6 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.5 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.4 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.3 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.2 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.1 06-Jul-1997  thorpej Import yet another missing piece of IPFilter 3.2beta1.
 1.3.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.4.2.3 24-Nov-1998  cgd pull up rev(s) 1.9 from trunk (ipfilter 3.2.10). (mrg)
 1.4.2.2 22-Jul-1998  mellon Pull up 1.8
 1.4.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.11.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.8.2 27-Mar-2001  bouyer Sync with HEAD.
 1.11.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.11.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.11.2.1 20-Dec-1999  he Pull up revision 1.12 (requested by darrenr):
Update IPF to version 3.3.5.
 1.17.4.3 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.17.4.2 09-Feb-2002  he Pull up revisions 1.19-1.24 (requested by martti):
Updated IPFilter to 3.4.23.
 1.17.4.1 31-Aug-2000  veego Pull up ipf 3.4.9 (requested by veego). approved by releng-1-5.

basesrc/dist/ipf/HISTORY 1.8 -> 1.9
basesrc/dist/ipf/fils.c 1.9 -> 1.10
basesrc/dist/ipf/ip_sfil.c 1.5 -> 1.6
basesrc/dist/ipf/ipf.c 1.4 -> 1.5
basesrc/dist/ipf/ipmon.c 1.4 -> 1.5
basesrc/dist/ipf/ipnat.c 1.5 -> 1.6
basesrc/dist/ipf/natparse.c 1.3 -> 1.4
basesrc/dist/ipf/parse.c 1.4 -> 1.5
basesrc/dist/ipf/iplang/iplang_y.y 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.1 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.5 1.1 -> 1.2
syssrc/sys/netinet/fil.c 1.36 -> 1.37
syssrc/sys/netinet/ip_auth.c 1.17 -> 1.18
syssrc/sys/netinet/ip_fil.c 1.57 -> 1.58
syssrc/sys/netinet/ip_ftp_pxy.c 1.16 -> 1.17
syssrc/sys/netinet/ip_log.c 1.10 -> 1.11
syssrc/sys/netinet/ip_nat.c 1.34 -> 1.35
syssrc/sys/netinet/ip_nat.h 1.20 -> 1.21
syssrc/sys/netinet/ip_rcmd_pxy.c 1.4 -> 1.5
syssrc/sys/netinet/ip_state.c 1.26 -> 1.27
syssrc/sys/netinet/ip_state.h 1.16 -> 1.17
syssrc/sys/netinet/ipl.h 1.8 -> 1.9

Changes:
>3.4.9 08/08/2000 - Released
>
>implement new aging mechanism in fr_tcp_age()
>
>fix icmp state checking bug
>
>revamp buildsunos script and build both sparcv7/sparcv9 for Solaris
>if on an Ultra with a 64bit system & compiler (Caseper Dik)
>
>open ipfilter device read only if we know we can
>
>print out better information for ICMP packets in ipmon
>
>move checking for source spoofed packets to a point where we can generate
>logs of them
>
>return EFAULT from ircopyptr/iwcopyptr
>
>don't do ioctl(SIOCGETFS) for auth stats
>
>fix up freeing mbufs for post-4.3BSD
>
>fix returning of inc from ftp proxy
>
>fix bugs with ipfs -R/-W (Caseper Dik)
>
>3.4.8 19/07/2000 - Released
>
>create fake opt_inet6.h for FreeBSD-4 compile as LKM
>
>add #ifdef's for KLD_MODULE sanity
>
>NAT fastroute'd packets which come out of return-*
>
>fix upper/lower case crap in ftp proxy and get seq# checking fixed up.
>
>3.4.7 08/07/2000 - Released
>
>make "ipf -y" lookup NAT if's which are unknown
>
>prepend line numbers to ioctl error messages in ipf/ipnat
>
>don't apply patches to FreeBSD twice
>
>allow for ip_len to be on an unaligned boundary early on in fr_precheck
>
>fix printing of icmp code when it is 0
>
>correct printing of port numbers in map rules with from/to
>
>don't allow fr_func to be called at securelevel > 0 or rules to be added
>if securelevel > 0 if they have a non-zero fr_func.
 1.18.2.9 20-Sep-2002  thorpej Sync with HEAD.
 1.18.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.18.2.7 04-May-2002  thorpej Update from trunk.
 1.18.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.18.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.18.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.18.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.18.2.2 08-Oct-2001  nathanw Catch up to -current.
 1.18.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.19.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.19.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.19.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.19.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.19.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.27.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.27.2.1 20-Jun-2002  gehenna catch up with -current.
 1.30.6.2 19-Oct-2004  skrll Sync with HEAD
 1.30.6.1 03-Aug-2004  skrll Sync with HEAD
 1.33.2.1 13-Aug-2004  jmc branches: 1.33.2.1.2;
Pullup rev 1.34 (requested by christos in ticket #1727)

Sync up w. ipf 4.1.3
 1.33.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.35 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.12 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.11 28-Mar-2004  martti branches: 1.11.4;
Upgraded IPFilter to 4.1.1
 1.10 24-Jan-2002  martti branches: 1.10.16;
Re-sync with IPFilter
 1.9 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.8 26-Mar-2001  mike branches: 1.8.2;
Resolve conflicts.
 1.7 23-May-2000  veego branches: 1.7.4; 1.7.6;
Resolve conflicts.
 1.6 03-May-2000  veego Resolve conflicts.
 1.5 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.4 22-Nov-1998  mrg branches: 1.4.4; 1.4.10; 1.4.16;
merge ipf 3.2.10
 1.3 13-Sep-1998  christos Fix copyright spacing and 'Van' -> 'van' for consistency.
 1.2 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.1 06-Jul-1997  thorpej branches: 1.1.1;
Initial revision
 1.1.1.13 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.12 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.11 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.10 23-May-2000  veego Import IP Filter 3.4.4
 1.1.1.9 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.8 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.7 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.6 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.5 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.4 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.3 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.2 21-Sep-1997  veego branches: 1.1.1.2.2;
Import ip-filter 3.2beta5
 1.1.1.1 06-Jul-1997  thorpej branches: 1.1.1.1.2;
Import ip_auth.h from IPFilter 3.2beta1; this was missed during the
upgrade.
 1.1.1.2.2.3 24-Nov-1998  cgd pull up rev(s) 1.4 from trunk (ipfilter 3.2.10). (mrg)
 1.1.1.2.2.2 22-Jul-1998  mellon Pull up 1.2 (veego)
 1.1.1.2.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.1.1.1.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.4.16.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.4.10.2 27-Mar-2001  bouyer Sync with HEAD.
 1.4.10.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.4.1 20-Dec-1999  he Pull up revision 1.5 (requested by darrenr):
Update IPF to version 3.3.5.
 1.7.6.2 28-Feb-2002  nathanw Catch up to -current.
 1.7.6.1 09-Apr-2001  nathanw Catch up with -current.
 1.7.4.1 09-Feb-2002  he Pull up revisions 1.8-1.10 (requested by martti):
Updated IPFilter to 3.4.23
 1.8.2.1 11-Feb-2002  jdolecek Sync w/ -current.
 1.10.16.4 19-Oct-2004  skrll Sync with HEAD
 1.10.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.10.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.10.16.1 03-Aug-2004  skrll Sync with HEAD
 1.11.4.1 06-Feb-2005  jmc Pull up revision 1.12 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.121 20-Dec-2024  rin carp_join_multicast: Stop allocating ip_moptions on stack

It was just a waste of memory. NFC otherwise, and no regression
observed for full ATF run on amd64.

Partially taken from OpenBSD:
https://github.com/openbsd/src/commit/1f237790b75

XXX
Seems like OpenBSD has some fixes to carp, that may improve our
implementation...
 1.120 01-Aug-2023  mrg branches: 1.120.6;
fix simple mis-matched function prototype and definitions.

most of these are like, eg

void foo(int[2]);

with either of these

void foo(int*) { ... }
void foo(int[]) { ... }

in some cases (such as stat or utimes* calls found in our header files),
we now match standard definition from opengroup.

found by GCC 12.
 1.119 07-Apr-2023  mlelstv Select virtual address as sender if backing interface is anonymous.
Use correct scope for IPv6.
 1.118 26-Mar-2023  mlelstv Use backing device to send advertisements. Otherwise the packets originate
from the virtual MAC address, which confuses switches.
 1.117 02-Sep-2022  thorpej branches: 1.117.4;
Remove unnecessary inclusion of <net/netisr.h>.
 1.116 30-Sep-2021  yamaguchi carp: Register carp_carpdev_state to link-state change hook
 1.115 16-Jun-2021  riastradh if_attach and if_initialize cannot fail, don't test return value

These were originally made failable back in 2017 when if_initialize
allocated a softint in every interface for link state changes, so
that it could fail gracefully instead of panicking:

https://mail-index.NetBSD.org/source-changes/2017/10/23/msg089053.html

However, this spawned many seldom- or never-tested error branches,
which are risky to have around. And that softint in every interface
has since been replaced by a single global workqueue, because link
state changes require thread context but not low latency or high
throughput:

https://mail-index.NetBSD.org/source-changes/2020/02/06/msg113759.html

So there is no longer any reason for if_initialize to fail. (The
subroutine if_stats_init can't fail because percpu_alloc can't fail
either.)

There is a snag: the softint_establish in if_percpuq_create could
fail, potentially leading to bad consequences later on trying to use
the softint. This change doesn't introduce any new bugs because of
the snag -- if_percpuq_attach was already broken. However, the snag
can be better addressed without spawning error branches, either by
using a single softint or making softints less scarce.

(Separate commit will change the signatures of if_attach and
if_initialize to return void, scheduled to ride whatever is the next
convenient kernel bump.)

Patch and testing on amd64 and evbmips64-eb by maya@; commit message
soliloquy, and compile-testing on evbppc/i386/earmv7hf, by me.
 1.114 14-Oct-2020  roy branches: 1.114.6;
carp: Don't set a link level address if vhid == -1

Link level address for carp is dervied from vhid.
Until vhid is set, carp is useless, so don't give it a link level address
until a vhid is set.

This fixes recent test case breakage where carp was fixed to actually
print the ethernet address set by default. Note that neither carp nor
the test case itself was actually broken as the error is the common
ATF net code assuming that a cloned interface's link level address is
unique upon creation.
 1.113 12-Oct-2020  roy carp: link state is DOWN until it becomes a MASTER

This is consitent with other BSD's handling of CARP and means
we don't have to carry a custom flag for it.
 1.112 12-Oct-2020  roy carp: Set ethernet address just before interface registation

Otherwise ifconfig reports SIOCGLIFADDR errors.
 1.111 09-Oct-2020  roy carp: Remove media, software should use link status.

carp literally has no media just like ppp, vlan, etc.
 1.110 06-Feb-2020  thorpej Perform link state change processing on a work queue, rather than in a
softint.
 1.109 04-Feb-2020  thorpej Use ifmedia_fini().
 1.108 29-Jan-2020  thorpej Adopt <net/if_stats.h>.
 1.107 20-Jan-2020  thorpej Remove FDDI support.
 1.106 19-Jan-2020  thorpej Remove Token Ring support.
 1.105 16-Jan-2020  kardel Provide SIOCGIFMEDIA ioctl to deliver link status.
Add link0 (IFF_LINK0) flag to map INIT state to LINK_STATE_DOWN
instead of LINK_STATE_UNKNOWN. This allows routing software to
suppress routes to the interface of the carp interface when in
init state (e. g. link down in the parent interface).
 1.104 10-Nov-2019  chs branches: 1.104.2;
in many device attach paths, allocate memory with M_WAITOK instead of M_NOWAIT
and remove code to handle failures that can no longer happen.
 1.103 01-Jun-2019  joerg Define carp6_cksum only when it is used, that is under INET6
 1.102 14-Mar-2019  ozaki-r carp: don't skip pserialize_read_enter and ifa_release
 1.101 22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.100 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.99 26-Jun-2018  msaitoh branches: 1.99.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.98 14-Jun-2018  yamaguchi Add the lock to refer the list included in ethercom for safety

The lock is already held while adding and deleting
ok ozaki-r@
 1.97 14-Jun-2018  yamaguchi Use ether_lookup_multi() instead of the macro

ok ozaki-r@
 1.96 18-May-2018  maxv IP6_EXTHDR_GET -> M_REGION_GET, no functional change.
 1.95 21-Mar-2018  maxv Fix an untriggerable memory leak. carp_prepare_ad does not fail, so switch
it to void.
 1.94 06-Dec-2017  ozaki-r branches: 1.94.2;
Make if_link_queue MP-safe if IFEF_MPSAFE

if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.

Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.

Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
 1.93 22-Nov-2017  ozaki-r Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
 1.92 16-Nov-2017  ozaki-r Unify IFEF_*_MPSAFE into IFEF_MPSAFE

There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.

Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).

Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.

Proposed on tech-kern@ and tech-net@
 1.91 23-Oct-2017  msaitoh If if_initialize() failed in the attach function, free resources and return.
 1.90 19-May-2017  ozaki-r branches: 1.90.2;
Allow CARP to call the link_state_change handler immediately

If the handler is delayed because of the indirection call via softint,
some operations are executed in reverse and may cause unexpected
behaviors. For example, due to the issue a GARP packet wasn't sent on
a transition from the BACKUP state to the MASTER state; this happened
because IN_IFF_DETACHED flag wasn't cleared on arpannounce, which
had been cleared in the link_state_change handler.

This fixes an issue reported by sborrill@ on tech-net:
http://mail-index.netbsd.org/tech-net/2017/03/14/msg006283.html
 1.89 12-May-2017  ryo replace in_fmtaddr() by IN_PRINT(), and delete function in_fmtaddr()
 1.88 12-May-2017  roy carp should call if_link_state_change instead of affecting
if_link_state directly.
 1.87 19-Apr-2017  ozaki-r branches: 1.87.2;
Fix build without INET6
 1.86 14-Mar-2017  ozaki-r Use if_acquire and if_release instead of using psref API directly

- Provide if_release for consistency to if_acquire
- Use if_acquire and if_release for ifp iterations
- Make ifnet_psref_class static
 1.85 27-Feb-2017  ozaki-r Make CARP on IPv6 work

It passes ATF tests but no more, no less.
 1.84 02-Feb-2017  ozaki-r Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net
 1.83 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.82 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.81 28-Dec-2016  ozaki-r branches: 1.81.2;
Use ether_ifattach in carp_clone_create instead of C&P code

carp_clone_destroy calls ether_ifdetach so not calling ether_ifattach is
inconsistent. If we add something pair of initialization and destruction
to ether_ifattach and ether_ifdetach (e.g., mutex_init/mutex_destroy),
ether_ifdetach of carp_clone_destroy won't work. So use ether_ifattach.

In order to do so, make ether_ifattach accept the 2nd argument (lla) as
NULL to allow carp to initialize its link level address by itself.
 1.80 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.79 11-Oct-2016  roy Remove unused variable.
 1.78 11-Oct-2016  roy Mark arprequest static and introduce arpannounce so that gratuitous
ARP requests are only send from valid addresses.
 1.77 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.76 23-Jul-2016  is Print the IPv6 or IPv4 source addresses of packets with wrong hash, to
help debugging.
 1.75 23-Jul-2016  is Workaround for PR 47013 by bouyer@. Only works for mixed IPv4/IPv6
environemnts, not for pure-IPv6 yet. A real fix is still needed.
 1.74 07-Jul-2016  ozaki-r branches: 1.74.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.73 06-Jul-2016  ozaki-r Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.
 1.72 04-Jul-2016  ozaki-r Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.71 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.70 20-Jun-2016  knakahara fix: i386/ALL build failure
 1.69 20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.68 16-Jun-2016  ozaki-r Use curlwp_bind and curlwp_bindx instead of open-coding LP_BOUND
 1.67 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.66 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.65 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.64 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.63 04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.62 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.61 20-Aug-2015  christos include "ioconf.h" to get the 'void <driver>attach(int count);' prototype.
 1.60 26-Feb-2015  roy Introduce the routing flag RTF_LOCAL to track local address routes.
Add functions rt_ifa_addlocal() and rt_ifa_remlocal() to add and remove
local routes for the address and announce the new address and route
to the routing socket.

Add in_ifaddlocal() and in_ifremlocal() to use these functions.
Rename in6_if{add,rem}loop() to in6_if{add,rem}local() and use these
functions.

rtinit() no longer announces the address, just the network route for the
address. As such, calls to rt_newaddrmsg() have been removed from
in_addprefix() and in_scrubprefix().

This solves the problem of potentially more than one announcement, or no
announcement at all for the address in certain situations.
 1.59 31-Jul-2014  ozaki-r branches: 1.59.2; 1.59.4;
Make carp_suppress_preempt global back

It is still accessed by if_pfsync.c.

This unbreaks the build of i386 kernel.
 1.58 31-Jul-2014  ozaki-r Make local functions/variables static

No functional change.
 1.57 06-Jun-2014  rmind - Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.
 1.56 29-May-2014  rmind Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.
 1.55 17-May-2014  rmind - Move IFNET_*() macros under #ifdef _KERNEL.
- Replace TAILQ_FOREACH on ifnet with IFNET_FOREACH().
 1.54 13-May-2014  bouyer Make sure *(if_output)() is called with KERNEL_LOCK held.
Add some KASSERT for this.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details.
 1.53 04-Apr-2014  bouyer branches: 1.53.2;
Proper MBUFTRACE handling. Without it, ec_tx_mowner, ec_rx_mowner and
ifp->if_mowner would be used uninitialised.
 1.52 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.51 18-Oct-2013  christos remove unused variable
 1.50 20-Aug-2012  christos branches: 1.50.2; 1.50.4;
make this compile.
 1.49 20-Aug-2012  bouyer Support checksum offloading in carp(4) if the underlying device suports it,
as proposed on tech-net@ on 2 Aug 2012.
 1.48 27-Mar-2012  bouyer Do not sleep in callout context, this will hang the clock soft interrupt.
Should fix PR kern/46217.
 1.47 19-Nov-2011  tls branches: 1.47.2; 1.47.4;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.46 19-Oct-2011  dyoung branches: 1.46.2;
Use if_addr_init() and if_mcast_op() instead of ifp->if_ioctl().
 1.45 17-Jul-2011  joerg Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.44 11-Aug-2010  pooka ahem, min -> max in previous
 1.43 11-Aug-2010  pooka Use kpause() instead of DELAY() and sleep a minimum of 1 tick.
This is possible now since softints have a thread context. It's
also not a very frequent code path. Addresses ABI issue with delay
(kern/40505).

I'm not entire sure what this delay is meant to accomplish, though.
 1.42 10-Aug-2010  pooka Include opt_inet since this checks INET/INET6
 1.41 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.40 19-Jan-2010  pooka branches: 1.40.2; 1.40.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.39 16-Sep-2009  pooka Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.38 07-Jun-2009  taca Make ip_carp.c compile, fixing usage of CARP_LOG().
 1.37 27-May-2009  christos PR/38260: Brian Seklecki: Improve carp logging.
 1.36 12-May-2009  elad Fix previous, || -> &&.

Pointed out by cube@, thanks!
 1.35 12-May-2009  elad Fix inverted permissions check.
 1.34 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.33 18-Mar-2009  cegger bcopy -> memcpy
 1.32 18-Mar-2009  cegger bzero -> memset
 1.31 18-Mar-2009  cegger bcmp -> memcmp
 1.30 11-Jan-2009  christos branches: 1.30.2;
merge christos-time_t
 1.29 19-Dec-2008  cegger use M_ZERO on malloc() and remove subsequent bzero().
 1.28 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.27 07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.26 04-May-2008  thorpej branches: 1.26.6; 1.26.8; 1.26.10; 1.26.14;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.25 23-Apr-2008  thorpej branches: 1.25.2;
Use <net/net_stats.h> / netstat_sysctl().
 1.24 15-Apr-2008  thorpej branches: 1.24.2;
Make CARP status per-cpu.
 1.23 15-Mar-2008  ws branches: 1.23.2;
Set scope on IPv6 multicast address to give carp a chance to work for IPv6, too.
From FreeBSD.
 1.22 21-Dec-2007  matt branches: 1.22.2; 1.22.6;
Add fix for ip_id information leakage. Since the leakage information is
primarily used with TCP SYN and RST packets and such packets are less than
the smallest sized packet that an IP stack is allowed to fragment, we simply
set ip_id to 0 for all packets 68 bytes or less.
 1.21 20-Dec-2007  dyoung Constify struct ifnet->if_sadl and every use throughout the tree.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
 1.20 11-Dec-2007  lukem use __KERNEL_RCSID()
 1.19 04-Dec-2007  dyoung branches: 1.19.2; 1.19.4;
Use IFADDR_FOREACH().
 1.18 19-Oct-2007  ad branches: 1.18.2; 1.18.4;
machine/{bus,cpu,intr}.h -> sys/{bus,cpu,intr}.h
 1.17 19-Sep-2007  dyoung branches: 1.17.4;
Constify sockaddr argument to ether_multiaddr(). Change struct
ifreq * arguments to ether_addmulti() and ether_delmulti() to const
struct sockaddr *, since ether_{add,del}multi() only ever read the
sockaddr ifreq member, ifr_addr. Update uses in carp(4) and in
vlan(4).
 1.16 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.15 26-Aug-2007  dyoung branches: 1.15.2;
Constify: LLADDR -> CLLADDR. I'm aiming here to make it easier to
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).

Use sockaddr_dl_setaddr() in a few places.
 1.14 19-Jul-2007  dyoung branches: 1.14.4; 1.14.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.13 09-Jul-2007  ad branches: 1.13.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.12 04-Mar-2007  christos branches: 1.12.2; 1.12.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.11 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.10 16-Nov-2006  christos branches: 1.10.4;
__unused removal on arguments; approved by core.
 1.9 30-Oct-2006  christos Fix typo (hi Elad)
 1.8 25-Oct-2006  elad Kill some KAUTH_GENERIC_ISSUSER.
 1.7 20-Oct-2006  liamjfoy Remove some dead code - From OpenBSD Rev. 1.129
 1.6 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.5 23-Jul-2006  ad branches: 1.5.4; 1.5.6; 1.5.8;
Use the LWP cached credentials where sane.
 1.4 13-Jun-2006  riz branches: 1.4.4;
Remove implementation of tvtohz() - since the timecounters branch
was merged, this is now in sys/kern/kern_clock.c .
 1.3 25-May-2006  liamjfoy branches: 1.3.2;
remove a little white space
 1.2 24-May-2006  liamjfoy branches: 1.2.2;
Add a check for our own advertisements. This is due to non-simplex
interfaces which received the packets they have just sent.

From: OpenBSD (rev. 1.124)
ok: christos@
 1.1 18-May-2006  liamjfoy branches: 1.1.2; 1.1.4;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.1.4.4 11-Aug-2006  yamt sync with head
 1.1.4.3 26-Jun-2006  yamt sync with head.
 1.1.4.2 24-May-2006  yamt sync with head.
 1.1.4.1 18-May-2006  yamt file ip_carp.c was added on branch yamt-pdpolicy on 2006-05-24 10:59:03 +0000
 1.1.2.1 19-Jun-2006  chap Sync with head.
 1.2.2.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.2.2.1 24-May-2006  tron file ip_carp.c was added on branch peter-altq on 2006-05-24 15:50:44 +0000
 1.3.2.2 01-Jun-2006  kardel Sync with head.
 1.3.2.1 25-May-2006  kardel file ip_carp.c was added on branch simonb-timecounters on 2006-06-01 22:38:46 +0000
 1.4.4.9 17-Mar-2008  yamt sync with head.
 1.4.4.8 21-Jan-2008  yamt sync with head
 1.4.4.7 07-Dec-2007  yamt sync with head
 1.4.4.6 27-Oct-2007  yamt sync with head.
 1.4.4.5 03-Sep-2007  yamt sync with head.
 1.4.4.4 26-Feb-2007  yamt sync with head.
 1.4.4.3 30-Dec-2006  yamt sync with head.
 1.4.4.2 21-Jun-2006  yamt sync with head.
 1.4.4.1 13-Jun-2006  yamt file ip_carp.c was added on branch yamt-lazymbuf on 2006-06-21 15:11:01 +0000
 1.5.8.2 10-Dec-2006  yamt sync with head.
 1.5.8.1 22-Oct-2006  yamt sync with head
 1.5.6.2 09-Sep-2006  rpaulo sync with head
 1.5.6.1 23-Jul-2006  rpaulo file ip_carp.c was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:58:47 +0000
 1.5.4.1 18-Nov-2006  ad Sync with head.
 1.10.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.10.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.12.4.1 11-Jul-2007  mjf Sync with head.
 1.12.2.4 23-Oct-2007  ad Sync with head.
 1.12.2.3 09-Oct-2007  ad Sync with head.
 1.12.2.2 20-Aug-2007  ad Sync with HEAD.
 1.12.2.1 01-Jul-2007  ad Adapt to callout API change.
 1.13.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.13.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.14.6.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.14.6.1 19-Jul-2007  dyoung file ip_carp.c was added on branch matt-mips64 on 2007-07-19 20:48:55 +0000
 1.14.4.4 09-Dec-2007  jmcneill Sync with HEAD.
 1.14.4.3 26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.14.4.2 02-Oct-2007  joerg Sync with HEAD.
 1.14.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.15.2.3 23-Mar-2008  matt sync with HEAD
 1.15.2.2 09-Jan-2008  matt sync with HEAD
 1.15.2.1 06-Nov-2007  matt sync with HEAD
 1.17.4.1 25-Oct-2007  bouyer Sync with HEAD.
 1.18.4.2 26-Dec-2007  ad Sync with head.
 1.18.4.1 08-Dec-2007  ad Sync with head.
 1.18.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.18.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.19.4.2 02-Jan-2008  bouyer Sync with HEAD
 1.19.4.1 13-Dec-2007  bouyer Sync with HEAD
 1.19.2.1 11-Dec-2007  yamt sync with head.
 1.22.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.22.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.22.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.22.2.1 24-Mar-2008  keiichi sync with head.
 1.23.2.4 27-Dec-2008  christos merge with head.
 1.23.2.3 09-Nov-2008  christos merge with head.
 1.23.2.2 01-Nov-2008  christos Sync with head.
 1.23.2.1 29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.24.2.1 18-May-2008  yamt sync with head.
 1.25.2.7 09-Oct-2010  yamt sync with head
 1.25.2.6 11-Aug-2010  yamt sync with head.
 1.25.2.5 11-Mar-2010  yamt sync with head
 1.25.2.4 20-Jun-2009  yamt sync with head
 1.25.2.3 16-May-2009  yamt sync with head
 1.25.2.2 04-May-2009  yamt sync with head.
 1.25.2.1 16-May-2008  yamt sync with head.
 1.26.14.1 21-Apr-2010  matt sync to netbsd-5
 1.26.10.2 09-Jun-2009  snj Pull up following revision(s) (requested by taca in ticket #796):
sys/netinet/ip_carp.c: revision 1.38
Make ip_carp.c compile, fixing usage of CARP_LOG().
 1.26.10.1 05-Jun-2009  snj Pull up following revision(s) (requested by christos in ticket #785):
sys/netinet/ip_carp.c: revision 1.37
PR/38260: Brian Seklecki: Improve carp logging.
 1.26.8.2 28-Apr-2009  skrll Sync with HEAD.
 1.26.8.1 19-Jan-2009  skrll Sync with HEAD.
 1.26.6.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.30.2.2 23-Jul-2009  jym Sync with HEAD.
 1.30.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.40.4.2 05-Mar-2011  rmind sync with head
 1.40.4.1 30-May-2010  rmind sync with head
 1.40.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.40.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.46.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.46.2.2 30-Oct-2012  yamt sync with head
 1.46.2.1 17-Apr-2012  yamt sync with head
 1.47.4.5 28-Aug-2016  bouyer Pull up following revision(s) (requested by is in ticket #1393):
sys/netinet/ip_carp.c: revision 1.75
Workaround for PR 47013 by bouyer@. Only works for mixed IPv4/IPv6
environemnts, not for pure-IPv6 yet. A real fix is still needed.
 1.47.4.4 27-Aug-2016  bouyer Pull up following revision(s) (requested by is in ticket #1394):
sys/netinet/ip_carp.c: revision 1.76
Print the IPv6 or IPv4 source addresses of packets with wrong hash, to
help debugging.
 1.47.4.3 03-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.47.4.2 11-Apr-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1043):
sys/netinet/ip_carp.c: revision 1.53
Proper MBUFTRACE handling. Without it, ec_tx_mowner, ec_rx_mowner and
ifp->if_mowner would be used uninitialised.
 1.47.4.1 02-Apr-2012  riz branches: 1.47.4.1.4; 1.47.4.1.6;
Pull up following revision(s) (requested by bouyer in ticket #145):
sys/netinet/ip_carp.c: revision 1.48
Do not sleep in callout context, this will hang the clock soft interrupt.
Should fix PR kern/46217.
 1.47.4.1.6.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.47.4.1.4.1 18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.47.2.1 05-Apr-2012  mrg sync to latest -current.
 1.50.4.2 18-May-2014  rmind sync with head
 1.50.4.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.50.2.2 03-Dec-2017  jdolecek update from HEAD
 1.50.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.53.2.1 10-Aug-2014  tls Rebase.
 1.59.4.9 28-Aug-2017  skrll Sync with HEAD
 1.59.4.8 05-Feb-2017  skrll Sync with HEAD
 1.59.4.7 05-Dec-2016  skrll Sync with HEAD
 1.59.4.6 05-Oct-2016  skrll Sync with HEAD
 1.59.4.5 09-Jul-2016  skrll Sync with HEAD
 1.59.4.4 29-May-2016  skrll Sync with HEAD
 1.59.4.3 22-Apr-2016  skrll Sync with HEAD
 1.59.4.2 22-Sep-2015  skrll Sync with HEAD
 1.59.4.1 06-Apr-2015  skrll Sync with HEAD
 1.59.2.5 12-May-2017  sborrill Pull up the following revisions(s) (requested by roy in ticket #1420):
sys/netinet/ip_carp.c: revision 1.88

carp should call if_link_state_change instead of affecting
if_link_state directly.
 1.59.2.4 27-Aug-2016  snj Pull up following revision(s) (requested by is in ticket #1209):
sys/netinet/ip_carp.c: revision 1.76
Print the IPv6 or IPv4 source addresses of packets with wrong hash, to
help debugging.
 1.59.2.3 27-Aug-2016  snj Pull up following revision(s) (requested by is in ticket #1208):
sys/netinet/ip_carp.c: revision 1.75
Workaround for PR 47013 by bouyer@. Only works for mixed IPv4/IPv6
environemnts, not for pure-IPv6 yet. A real fix is still needed.
 1.59.2.2 23-Jul-2016  is backout last change (wrong branch).
 1.59.2.1 23-Jul-2016  is Log the IPv4/IPv6 source of incorrect hash packets, too. Needed for
meaningful debugging.
 1.74.2.6 26-Apr-2017  pgoyette Sync with HEAD
 1.74.2.5 20-Mar-2017  pgoyette Sync with HEAD
 1.74.2.4 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.74.2.3 04-Nov-2016  pgoyette Sync with HEAD
 1.74.2.2 06-Aug-2016  pgoyette Sync with HEAD
 1.74.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.81.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.87.2.1 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.90.2.4 19-Mar-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1218):

sys/netinet/ip_carp.c: revision 1.102

carp: don't skip pserialize_read_enter and ifa_release
 1.90.2.3 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.90.2.2 10-Dec-2017  snj Pull up following revision(s) (requested by msaitoh in ticket #427):
sys/arch/amiga/dev/if_bah_zbus.c: 1.17
sys/arch/arm/broadcom/bcm53xx_eth.c: 1.30
sys/arch/powerpc/booke/dev/pq3etsec.c: 1.32
sys/arch/usermode/dev/if_veth.c: 1.9
sys/dev/ic/an.c: 1.66
sys/dev/ic/athn.c: 1.17
sys/dev/ic/atw.c: 1.162
sys/dev/ic/bwi.c: 1.33
sys/dev/ic/dwc_gmac.c: 1.41-1.42
sys/dev/ic/malo.c: 1.10
sys/dev/ic/rt2560.c: 1.31
sys/dev/ic/rt2661.c: 1.36
sys/dev/ic/rt2860.c: 1.29
sys/dev/ic/rtw.c: 1.127
sys/dev/ic/rtwvar.h: 1.46
sys/dev/ic/smc90cx6.c: 1.71
sys/dev/ic/smc90cx6var.h: 1.12
sys/dev/ic/wi.c: 1.244
sys/dev/pci/if_ipw.c: 1.66
sys/dev/pci/if_iwi.c: 1.104
sys/dev/pci/if_iwm.c: 1.76
sys/dev/pci/if_iwn.c: 1.86
sys/dev/pci/if_rtwn.c: 1.13
sys/dev/pci/if_wm.c: 1.541
sys/dev/pci/if_wpi.c: 1.79
sys/dev/pci/ixgbe/ixgbe.c: 1.106
sys/dev/pci/ixgbe/ixv.c: 1.73 via patch
sys/dev/pcmcia/if_malo_pcmcia.c: 1.15
sys/dev/scsipi/if_se.c: 1.95
sys/dev/usb/if_upl.c: 1.60
sys/net/if.c: 1.396
sys/net/if.h: 1.241
sys/net/if_arc.h: 1.23
sys/net/if_arcsubr.c: 1.78
sys/net/if_bridge.c: 1.136-1.137
sys/net/if_etherip.c: 1.39
sys/net/if_faith.c: 1.56
sys/net/if_gif.c: 1.131
sys/net/if_loop.c: 1.96
sys/net/if_mpls.c: 1.30
sys/net/if_pppoe.c: 1.129
sys/net/if_srt.c: 1.27
sys/net/if_stf.c: 1.102
sys/net/if_tap.c: 1.100
sys/net/if_vlan.c: 1.105
sys/netinet/ip_carp.c: 1.91
sys/rump/net/lib/libshmif/if_shmem.c: 1.73-1.74
sys/rump/net/lib/libvirtif/if_virt.c: 1.55-1.56
if_initalize() and if_attach() failed when resource allocation failed
(e.g. allocating softint). Without this change, it panics. It's bad because
resource shortage really occured when a lot of pseudo interface is created.
To avoid this problem, don't panic and change return value of if_initialize()
and if_attach() to int. Caller fanction will be recover from error cleanly by
checking the return value.
Return if bah_attach_subr() failed.
If if_attach() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add some missing frees in bridge_clone_destroy().
- KNF
If error occured in bcmeth_ccb_attach(), free resources and return.
If error occured in pq3etsec_attach(), free resources and return.
If error occured in the attach function, free resources and return.
- If if_initialize() failed in athn_attach(), free resources and return.
- Add missing pmf_event_deregister() in athn_detach().
- Free resources correctly on some errors in atw_attach().
- Use apint*() insread of printf() in the attach function.
If if_initialize() failed in the attach function, return.
- If if_initialize() failed in the attach function, free resources and return.
- Add missing dwc_gmac_free_dma_rings() and mutex_destroy() when attach
failed.
- If if_initialize() failed in the attach function, free resources and return.
- ifp is always not NULL in iwi_detach(). Check correctly with ifp->if_softc.
- If if_initialize() failed in the attach function, free resources and return.
- Fix error path in the attach function correctly.
If if_initialize() failed in the attach function, free resources and return.
If if_attach() failed in the attach function, free resources and return.
- If if_initialize() failed in the attach function, free resources and return.
- KNF
- If if_attach() failed in the attach function, free resources and return.
- KNF
Fix compile error.
Fix compile error.
We don't need '&mii', but just 'mii' for mii_detach().
Don't free sc_rthash twice
 1.90.2.1 30-Nov-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #407):
sys/compat/linux32/common/linux32_socket.c: revision 1.28
sys/net/if.c: revision 1.400
sys/netipsec/key.c: revision 1.243
sys/compat/linux/common/linux_socket.c: revision 1.139
sys/netinet/ip_carp.c: revision 1.93
sys/netinet6/in6.c: revision 1.252
sys/netinet6/in6.c: revision 1.253
sys/netinet6/in6.c: revision 1.254
sys/net/if_spppsubr.c: revision 1.173
sys/net/if_spppsubr.c: revision 1.174
sys/compat/common/uipc_syscalls_40.c: revision 1.14
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
Fix usage of FOREACH macro
key_sad.lock is held there so SAVLIST_WRITER_FOREACH is enough.
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref
Protect IFADDR_READER_FOREACH and obtained ifa with psz/psref (more)
Fix and make consistent of usages of psz/psref in ifconf variants
Remove unnecessary goto because there is no cleanup code to share (NFC)
Tweak a condition; we don't need to care ifacount to be negative
Fix a race condition of in6_ifinit
in6_ifinit checks the number of IPv6 addresses on a given interface and
if it's zero (i.e., an IPv6 address being assigned to the interface
is the first one), call if_addr_init. However, the actual assignment of
the address (ifa_insert) is out of in6_ifinit. The check and the
assignment must be done atomically.
Fix it by holding in6_ifaddr_lock during in6_ifinit and ifa_insert.
And also add missing pserialize to IFADDR_READER_FOREACH.
 1.94.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.94.2.5 30-Sep-2018  pgoyette Ssync with HEAD
 1.94.2.4 28-Jul-2018  pgoyette Sync with HEAD
 1.94.2.3 25-Jun-2018  pgoyette Sync with HEAD
 1.94.2.2 21-May-2018  pgoyette Sync with HEAD
 1.94.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.99.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.99.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.99.2.1 10-Jun-2019  christos Sync with HEAD
 1.104.2.3 29-Feb-2020  ad Sync with head.
 1.104.2.2 25-Jan-2020  ad Sync with head.
 1.104.2.1 17-Jan-2020  ad Sync with head.
 1.114.6.1 17-Jun-2021  thorpej Sync w/ HEAD.
 1.117.4.1 21-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #902):

sbin/ifconfig/carp.c: revision 1.15
sbin/ifconfig/ifconfig.8: revision 1.125
tests/net/carp/t_basic.sh: revision 1.9
sys/netinet/ip_carp.c: revision 1.118
sys/netinet/ip_carp.c: revision 1.119

Fix parser for carp state.

The state values are uppercase words INIT, BACKUP and MASTER.

Use backing device to send advertisements. Otherwise the packets originate
from the virtual MAC address, which confuses switches.

Select virtual address as sender if backing interface is anonymous.

Use correct scope for IPv6.

Don't expect the net/carp/t_basic/carp_handover_ipv6_halt_nocarpdevip
and carp_handover_ipv6_ifdown_nocarpdevip test cases to fail. At
least on the TNF i386 and amd64 testbeds, they pass more often than
not since the commit of src/sys/netinet/ip_carp.c 1.119 by mlelstv on
2023.04.07.06.44.08.
 1.120.6.1 02-Aug-2025  perseant Sync with HEAD
 1.14 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.13 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.12 12-Oct-2020  roy branches: 1.12.2;
carp: link state is DOWN until it becomes a MASTER

This is consitent with other BSD's handling of CARP and means
we don't have to carry a custom flag for it.
 1.11 16-Jan-2020  kardel Provide SIOCGIFMEDIA ioctl to deliver link status.
Add link0 (IFF_LINK0) flag to map INIT state to LINK_STATE_DOWN
instead of LINK_STATE_UNKNOWN. This allows routing software to
suppress routes to the interface of the carp interface when in
init state (e. g. link down in the parent interface).
 1.10 14-Sep-2018  maxv branches: 1.10.6;
Use non-variadic function pointer in protosw::pr_input.
 1.9 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.8 28-Apr-2016  ozaki-r branches: 1.8.16; 1.8.18;
Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.7 31-Jul-2014  ozaki-r branches: 1.7.4;
KNF
 1.6 16-Sep-2009  pooka branches: 1.6.22; 1.6.36;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.5 16-Apr-2008  dyoung branches: 1.5.4;
C99 does not allow u_int8_t bitfields, so use unsigned int, instead.
 1.4 15-Apr-2008  thorpej Make CARP status per-cpu.
 1.3 17-Feb-2007  dyoung branches: 1.3.38;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.2 13-Jun-2006  riz branches: 1.2.4; 1.2.10; 1.2.16;
Prototype for tvtohz() is no longer needed here.
 1.1 18-May-2006  liamjfoy branches: 1.1.2; 1.1.4; 1.1.6; 1.1.8;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.1.8.2 01-Jun-2006  kardel Sync with head.
 1.1.8.1 18-May-2006  kardel file ip_carp.h was added on branch simonb-timecounters on 2006-06-01 22:38:47 +0000
 1.1.6.2 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.1.6.1 18-May-2006  tron file ip_carp.h was added on branch peter-altq on 2006-05-24 15:50:45 +0000
 1.1.4.3 26-Jun-2006  yamt sync with head.
 1.1.4.2 24-May-2006  yamt sync with head.
 1.1.4.1 18-May-2006  yamt file ip_carp.h was added on branch yamt-pdpolicy on 2006-05-24 10:59:03 +0000
 1.1.2.1 19-Jun-2006  chap Sync with head.
 1.2.16.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.2.10.2 09-Sep-2006  rpaulo sync with head
 1.2.10.1 13-Jun-2006  rpaulo file ip_carp.h was added on branch rpaulo-netinet-merge-pcb on 2006-09-09 02:58:47 +0000
 1.2.4.3 26-Feb-2007  yamt sync with head.
 1.2.4.2 21-Jun-2006  yamt sync with head.
 1.2.4.1 13-Jun-2006  yamt file ip_carp.h was added on branch yamt-lazymbuf on 2006-06-21 15:11:01 +0000
 1.3.38.1 02-Jun-2008  mjf Sync with HEAD.
 1.5.4.1 11-Mar-2010  yamt sync with head
 1.6.36.1 10-Aug-2014  tls Rebase.
 1.6.22.2 03-Dec-2017  jdolecek update from HEAD
 1.6.22.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.7.4.1 29-May-2016  skrll Sync with HEAD
 1.8.18.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.8.18.1 10-Jun-2019  christos Sync with HEAD
 1.8.16.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.8.16.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.10.6.1 17-Jan-2020  ad Sync with head.
 1.12.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.40 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.39 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.38 09-May-2004  christos PR/25441: Matthew Green: IP-Filter uses M_TEMP when it already has M_IPFILTER
 1.37 31-Mar-2004  dyoung Only #define COPYIN copyin, et cetera, in the kernel. That is, only
when when _KERNEL is defined.
 1.36 31-Mar-2004  darrenr COPYIN/COPYOUT macros need to call copyin/out on NetBSD rather than just use
bcopy.
 1.35 28-Mar-2004  martti branches: 1.35.2;
Upgraded IPFilter to 4.1.1
 1.34 26-Jun-2003  itojun branches: 1.34.2;
tabify
 1.33 05-Mar-2003  ragge vax -> __vax__. Didn't I fix this a year ago?
 1.32 24-Feb-2003  thorpej Comment out the inclusion of <uvm/uvm_extern.h> -- the header is not
necessary.
 1.31 19-Sep-2002  martti Resync with official IPF
 1.30 19-Sep-2002  martti Upgraded IPFilter to 3.4.29
 1.29 09-Jun-2002  itojun whitespace
 1.28 02-May-2002  martti branches: 1.28.2; 1.28.4;
Upgraded IPFilter to 3.4.27
 1.27 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.26 24-Jan-2002  martti Re-sync with IPFilter
 1.25 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.24 26-May-2001  ragge branches: 1.24.2;
defined(vax) -> defined(__vax__). This may fix PR#12919.
 1.23 12-Apr-2001  thorpej Delete SPL_IMP(). It is not used in IP Filter, and it aids me
on my quest to eliminate the foul beast known as splimp.
 1.22 26-Mar-2001  mike Resolve conflicts.
 1.21 05-Feb-2001  chs branches: 1.21.2;
expose the definitions of MIN() and MAX() in sys/param.h to the kernel
and use those in favor of a dozen copies scattered around the source tree.
 1.20 28-Jun-2000  mrg remove include of <vm/vm.h>
 1.19 26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.18 03-May-2000  veego branches: 1.18.4;
Resolve conflicts.
 1.17 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.16 22-Nov-1998  mrg branches: 1.16.4; 1.16.10; 1.16.16;
merge ipf 3.2.10
 1.15 12-Jul-1998  veego Resolve conflicts from the import.
 1.14 29-May-1998  veego Resolve conflicts from the import of IPFilter 3.2.7.
 1.13 17-May-1998  veego Resolve conflicts
 1.12 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.11 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.10 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.9 21-Sep-1997  veego branches: 1.9.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.8 06-Jul-1997  thorpej branches: 1.8.2;
Restore original RCS IDs.
 1.7 05-Jul-1997  darrenr fix conflicts from import
 1.6 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.5 25-May-1997  darrenr fix conflicts
 1.4 15-Apr-1997  christos Fix SPLNET() conditional to work for NetBSD1_0+, not just the named versions.
 1.3 29-Mar-1997  thorpej Resolve conflicts from merge.

XXX !!! XXX !!!
I noticed a few semi-serious bugs while doing this merge, one of which
has existed for a fairly long time. Some of them are addressed in this
commit (because they caused the kernel to not compile), and are annoted
by "XXX" and "--thorpej". The other one will be addressed shortly in
a future commit, and, as far as I can tell, affects all operating systems
which IP Filter supports.
 1.2 05-Jan-1997  veego Add $NetBSD$ id's and restore the orginal Id's.
 1.1 05-Jan-1997  mrg branches: 1.1.1;
initial import of darren reed's ip-filter, version 3.1.2.
 1.1.1.21 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.20 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.19 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.18 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.17 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.16 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.15 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.14 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.13 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.12 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.11 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.10 29-May-1998  veego Import IP Filter 3.2.7
 1.1.1.9 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.8 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.7 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.6 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.5 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.4 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.3 25-May-1997  darrenr Import version 3.2alpha7
 1.1.1.2 27-Mar-1997  darrenr Bring in entire 3.2alpha2 source tree
 1.1.1.1 27-Mar-1997  darrenr Update to version 3.2alpha2
 1.8.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.9.2.4 24-Nov-1998  cgd pull up rev(s) 1.16 from trunk (ipfilter 3.2.10). (mrg)
 1.9.2.3 22-Jul-1998  mellon Pull up 1.15 (veego)
 1.9.2.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.9.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.16.16.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.16.10.4 21-Apr-2001  bouyer Sync with HEAD
 1.16.10.3 27-Mar-2001  bouyer Sync with HEAD.
 1.16.10.2 11-Feb-2001  bouyer Sync with HEAD.
 1.16.10.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.4.1 20-Dec-1999  he Pull up revision 1.17 (requested by darrenr):
Update IPF to version 3.3.5.
 1.18.4.2 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.18.4.1 09-Feb-2002  he Pull up revisions 1.19-1.26 (requested by martti):
Updated IPFilter to 3.4.23
 1.21.2.7 20-Sep-2002  thorpej Sync with HEAD.
 1.21.2.6 20-Jun-2002  nathanw Catch up to -current.
 1.21.2.5 04-May-2002  thorpej Update from trunk.
 1.21.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.21.2.3 28-Feb-2002  nathanw Catch up to -current.
 1.21.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.21.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.24.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.24.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.24.2.1 11-Feb-2002  jdolecek Sync w/ -current.
 1.28.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.28.2.1 20-Jun-2002  gehenna catch up with -current.
 1.34.2.2 19-Oct-2004  skrll Sync with HEAD
 1.34.2.1 03-Aug-2004  skrll Sync with HEAD
 1.35.2.4 13-Aug-2004  jmc branches: 1.35.2.4.2;
Pullup rev 1.39 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.35.2.3 30-May-2004  tron Pull up revision 1.38 (requested by christos in ticket #416):
PR/25441: Matthew Green: IP-Filter uses M_TEMP when it already has M_IPFILTER
 1.35.2.2 01-Apr-2004  jmc Pullup rev 1.37 (requested by mrg in ticket #34)

Only #define COPYIN copyin, et cetera, in the kernel.
 1.35.2.1 31-Mar-2004  tron Pull up revision 1.36 (requested by mrg in ticket #32):
COPYIN/COPYOUT macros need to call copyin/out on NetBSD rather than just use
bcopy.
 1.35.2.4.2.1 06-Feb-2005  jmc Pull up revision 1.40 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.17 09-Dec-2017  pgoyette Split ip_ecn code into its own module, so it can be shared between
gif(4), stf(4), and ipsec(4). Without this, loading the if_gif
module can result in redefined global symbols if either ipsec(4) or
stf(4) but not gif(4) is built into the kernel.

Fixes PR kern/52795 (as reported by martin@ via irc).

XXX pullup to netbsd-8
 1.16 24-Aug-2015  pooka branches: 1.16.10;
sprinkle _KERNEL_OPT
 1.15 05-Sep-2006  rpaulo branches: 1.15.102; 1.15.122;
Import of TCP ECN algorithm for congestion control.
Both available for IPv4 and IPv6.
Basic implementation test results are available at
http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.

Work sponsored by the Google Summer of Code project 2006.
Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their
help, comments and support during the project.
 1.14 11-Dec-2005  christos branches: 1.14.4; 1.14.8;
merge ktrace-lwp.
 1.13 03-Feb-2005  perry branches: 1.13.6;
ANSIfy function prototypes. (Still have about 3/5ths of the C files in
netinet to go...)
 1.12 10-Apr-2002  itojun branches: 1.12.10; 1.12.18; 1.12.20;
correct variable initialization. reported by fujitsu folks
 1.11 13-Nov-2001  lukem add RCSIDs
 1.10 10-May-2001  itojun branches: 1.10.2;
correct ecn consideration on tunnel encap/decap. sync with kame.
 1.9 02-Oct-2000  itojun branches: 1.9.2;
fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.8 06-Jan-2000  itojun remove too much portability code in KAME, to improve readability.
 1.7 12-Dec-1999  itojun sync with latest KAME (rcsid only).
 1.6 31-Jul-1999  itojun branches: 1.6.2; 1.6.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.5 30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.4 06-Jul-1999  itojun sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ip_ecn.c was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ip_ecn.c was added on branch chs-ubc2 on 1999-07-01 23:47:01 +0000
 1.6.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.9.2.3 17-Apr-2002  nathanw Catch up to -current.
 1.9.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.9.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.10.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.10.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.12.20.1 12-Feb-2005  yamt sync with head.
 1.12.18.1 29-Apr-2005  kent sync with -current
 1.12.10.1 04-Feb-2005  skrll Sync with HEAD.
 1.13.6.1 30-Dec-2006  yamt sync with head.
 1.14.8.1 14-Sep-2006  yamt sync with head.
 1.14.4.1 09-Sep-2006  rpaulo sync with head
 1.15.122.1 22-Sep-2015  skrll Sync with HEAD
 1.15.102.1 03-Dec-2017  jdolecek update from HEAD
 1.16.10.1 21-Dec-2017  snj Pull up following revision(s) (requested by pgoyette in ticket #436):
distrib/sets/lists/modules/mi: revision 1.112
sys/modules/Makefile: revision 1.196
sys/modules/ip_ecn/Makefile: revision 1.1
sys/modules/if_gif/Makefile: revision 1.3
sys/net/if_gif.c: revision 1.136
sys/netinet/ip_ecn.c: revision 1.17
Split ip_ecn code into its own module, so it can be shared between
gif(4), stf(4), and ipsec(4). Without this, loading the if_gif
module can result in redefined global symbols if either ipsec(4) or
stf(4) but not gif(4) is built into the kernel.
Fixes PR kern/52795 (as reported by martin@ via irc).
 1.12 12-Nov-2008  ad Remove LKMs and switch to the module framework, pass 1.

Proposed on tech-kern@.
 1.11 10-Dec-2005  elad branches: 1.11.70; 1.11.74; 1.11.80; 1.11.84;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.10 18-Apr-2004  matt branches: 1.10.12;
De __P()
 1.9 21-Dec-2001  itojun branches: 1.9.16;
whitespace. protect from multiple inclusion. sync with kame
 1.8 10-May-2001  itojun branches: 1.8.2;
correct ecn consideration on tunnel encap/decap. sync with kame.
 1.7 12-Dec-1999  itojun branches: 1.7.6;
sync with latest KAME (rcsid only).
 1.6 31-Jul-1999  itojun branches: 1.6.2; 1.6.8;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.5 09-Jul-1999  thorpej defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.4 06-Jul-1999  itojun sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.3 03-Jul-1999  thorpej RCS ID police.
 1.2 01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1 28-Jun-1999  itojun branches: 1.1.2;
file ip_ecn.h was initially added on branch kame.
 1.1.2.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1 01-Jul-1999  thorpej file ip_ecn.h was added on branch chs-ubc2 on 1999-07-01 23:47:01 +0000
 1.6.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.6.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.6.2 08-Jan-2002  nathanw Catch up to -current.
 1.7.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.8.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.9.16.4 11-Dec-2005  christos Sync with head.
 1.9.16.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.16.2 18-Sep-2004  skrll Sync with HEAD.
 1.9.16.1 03-Aug-2004  skrll Sync with HEAD
 1.10.12.1 21-Jun-2006  yamt sync with head.
 1.11.84.1 19-Jan-2009  skrll Sync with HEAD.
 1.11.80.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.11.74.1 04-May-2009  yamt sync with head.
 1.11.70.1 17-Jan-2009  mjf Sync with HEAD.
 1.78 26-Feb-2025  andvar Fix typos in comments, mainly s/calcurate/calculate/.
 1.77 07-Dec-2022  knakahara branches: 1.77.8;
Refactor ip_encap.[ch]

- remove encap_attach() which is no longer used
- remove USE_RADIX code in ip_encap.c, which is used for
encap_attach() only
- remove mask members in encaptab
 1.76 07-Dec-2022  knakahara Implement encap_attach_addr() which is used by IP-encaped tunnels.

The tunnels attached by encap_attach() can process receiving packets
fastly as the softc is searched by radix-tree. However, the tunnels
cannot use priority function which decides tunnel's softc by not only
source and destination but also other informations.
On the other hand, the tunnels attached by encap_attach_func() can
use priority function. However, the tunnels can be slow receiving
processing as the softc is searched by linear search (and uses each
priority function).

encap_attach_addr() can be used for tunnels which is fixed tunnel
source address and tunnel destination address. The tunnels attached
by encap_attach_addr() is searched by thmap(9), so the receiving processing
can be fast. Moreover, the tunnels can use priority function.
 1.75 07-Dec-2022  knakahara refactor: use typedef for ip_encap priority function
 1.74 22-Aug-2020  riastradh Mark KASSERT-only variable __diagused.
 1.73 20-Aug-2020  riastradh [ozaki-r] Changes to the kernel core for wireguard
 1.72 23-Jan-2020  knakahara Fix PR security/54881. Pointed out by ohishi@IIJ, thanks.
 1.71 15-May-2019  knakahara branches: 1.71.2; 1.71.4;
Fix build failure when INET6 is disabled and NET_MPSAFE is enabled. Pointed out by ozaki-r@n.o, thanks.
 1.70 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.69 21-Jun-2018  knakahara branches: 1.69.2;
sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.
However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.68 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.67 14-Jan-2018  maxv branches: 1.67.2;
Fix memory leak, found by Mootja.
 1.66 15-Nov-2017  knakahara Add argument to encapsw->pr_input() instead of m_tag.
 1.65 01-Jun-2017  chs branches: 1.65.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.64 15-Apr-2017  riastradh No need for membar_datadep_consumer here.

PSLIST_READER_FOREACH takes care of it already.
PSLIST_WRITER_FOREACH is exclusive so doesn't need it.
 1.63 07-Apr-2017  ozaki-r Commit a forgotten change for "Prepare netipsec for rump-ification"

- Allow encapinit to be called twice (by ifinit and ipe4_attach)
- ifinit didn't call encapinit if IPSEC is enabled (ipe4_attach called
it instead), however, on a rump kernel ipe4_attach may not be called
even if IPSEC is enabled. So we need to allow ifinit to call it anyway
 1.62 22-Dec-2016  knakahara branches: 1.62.2;
pserialize_perform() is required *after* PSLIST_WRITER_REMOVE.
 1.61 04-Jul-2016  knakahara branches: 1.61.2;
make gif(4) and ip_encap MP-ify
 1.60 04-Jul-2016  knakahara refactor: merge encap_init_once() to encapinit()
 1.59 04-Jul-2016  knakahara make encap_lock_{enter,exit} interruptable.
 1.58 04-Jul-2016  knakahara remove extra pserialize_perform()
 1.57 04-Jul-2016  knakahara use pserialize(9) and psref(9) (2/2) : ip_encap radix tree care
 1.56 04-Jul-2016  knakahara use pserialize(9) and psref(9) (1/2) : without ip_encap radix tree care
 1.55 04-Jul-2016  knakahara restore ifdef USE_RADIX (revert ip_encap.c:r1.44)

To help future ip_encap optimaization works.
 1.54 04-Jul-2016  knakahara let gif(4) promise softint(9) contract (2/2) : ip_encap side

The last commit does not care encaptab. This commit fixes encaptab race which
is used not only gif(4).
 1.53 26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.52 26-Feb-2016  knakahara To eliminate gif_softc_list linear search, add extra argument to encapsw.pr_ctlinput().
 1.51 26-Jan-2016  knakahara implement encapsw instead of protosw and uniform prototype.

suggested and advised by riastradh@n.o, thanks.

BTW, It seems in_stf_input() had bugs...
 1.50 22-Jan-2016  riastradh Back out previous change to introduce struct encapsw.

This change was intended, but Nakahara-san had already made a better
one locally! So I'll let him commit that one, and I'll try not to
step on anyone's toes again.
 1.49 22-Jan-2016  riastradh Don't abuse struct protosw for ip_encap -- introduce struct encapsw.

Mostly mechanical change to replace it, culling some now-needless
boilerplate around all the users.

This does not substantively change the ip_encap API or eliminate
abuse of sketchy pointer casts -- that will come later, and will be
easier now that it is not tangled up with struct protosw.
 1.48 20-Jan-2016  knakahara remove unused variable.
 1.47 09-Dec-2015  knakahara ip_encap uses kmem_alloc APIs instead of malloc.
 1.46 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.45 20-Apr-2015  ozaki-r Remove non-USE_RADIX case and USE_RADIX switch

It seems that we have been using ip_encap only with USE_RADIX
for long years. Let's remove unused non-USE_RADIX case.

No objection on tech-kern and tech-net.

Double-checked by knakahara@
 1.44 16-Apr-2015  ozaki-r Remove garbage undef
 1.43 15-Apr-2015  riastradh KASSERT x then y, not x && y, to give more specific errors.
 1.42 15-Apr-2015  ozaki-r Use LIST_FOREACH_SAFE

We have to use LIST_FOREACH_SAFE because LIST_REMOVE is used
inside the loop through encap_remove.
 1.41 15-Apr-2015  ozaki-r Replace DIAGNOSTIC & panic with KASSERT/KASSERTMSG
 1.40 15-Apr-2015  ozaki-r Add $NetBSD$ at the top of the file
 1.39 17-Jul-2011  joerg branches: 1.39.12; 1.39.30;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.38 27-May-2009  pooka Make it possible to register delayed radix tree head inits which
will be processed when the radix "subsystem" is initialized -- all
users must be attached before any inits to know the max keylength.
Use of link sets is no longer required, and only attached domains
need to be considered.
 1.37 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.36 18-Mar-2009  cegger bcopy -> memcpy
 1.35 18-Mar-2009  cegger bzero -> memset
 1.34 18-Mar-2009  cegger bcmp -> memcmp
 1.33 25-Nov-2008  pooka branches: 1.33.4;
Make dom_maxrtkey of inet/inet6domain the size of the ip_encap pack
structures. This is far from optimal, but gets rid of iffy
#ifdef INET in radix.c. The radix bonsai still needs lots of love
before loading domains dynamically is possible...
 1.32 24-Apr-2008  ad branches: 1.32.2; 1.32.8; 1.32.10;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.31 13-Jun-2007  dyoung branches: 1.31.28; 1.31.30;
Use LIST_FOREACH().
 1.30 04-Mar-2007  christos branches: 1.30.2; 1.30.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.29 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.28 28-May-2006  liamjfoy branches: 1.28.12;
remove some dead code

ok christos@
 1.27 11-Dec-2005  christos branches: 1.27.4; 1.27.6; 1.27.8; 1.27.14;
merge ktrace-lwp.
 1.26 06-Jun-2005  martin branches: 1.26.2;
Since we decided "const struct mbuf *" would not do the right thing (tm),
remove ~all const from mbuf pointers.
 1.25 03-Jun-2005  martin Sprinkle some const
 1.24 02-Jun-2005  tron Change the first argument of the encapsulation check function from
"const struct mbuf *" to "struct mbuf *". Without this change the
actual implementation cannot even use m_copydata() on the mbuf chain
which is broken.
 1.23 03-Feb-2005  perry ANSIfy function prototypes. (Still have about 3/5ths of the C files in
netinet to go...)
 1.22 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.21 24-Jan-2005  enami branches: 1.21.2;
To fix bad pointer dereference on start up when gif is used,
- Allow rn_init() to be called multiple times, but do nothing except the
first call.
- Include opt_inet.h so that #ifdef INET works.
- Call rn_init() from encap_init() explicitly rather than depending on the
order of initialization.
 1.20 24-Jan-2005  itojun get zero-cleared field on malloc. kame-pr-856
 1.19 17-Aug-2004  itojun branches: 1.19.4;
initialize max_keylen for ip_encap.c earlier
 1.18 26-Apr-2004  matt Remove #else clause of __STDC__
 1.17 04-Mar-2004  wiz branches: 1.17.4;
No need to include netinet/ip_mroute.h twice.
Closes PR 24652 by Kailash Sethuraman.
 1.16 30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.15 28-Oct-2003  mycroft Do the previous differently.
 1.14 25-Oct-2003  christos Fix uninitialized variable warning
 1.13 21-Jan-2003  itojun branches: 1.13.2;
correct panic when ip-in-ip encapsulation is used. found by Masanori Kanaoka
 1.12 17-Jan-2003  itojun switch from kame-based m_aux mbuf auxiliary data, to openbsd m_tag
implementation. it will simplify porting across *bsd (such as kame/altq),
and make us more synchronized. from Joel Wilsson
 1.11 25-Nov-2002  thorpej Avoid strict-alias warnings.
 1.10 31-Jul-2002  itojun remove packed attribute as it will cause data be unaligned
 1.9 09-Jun-2002  itojun whitespace
 1.8 04-Mar-2002  sommerfeld branches: 1.8.6; 1.8.8;
The "gif*" tunnelling interface does everything ipip does.
Move usage example from ipip.4 to gif.4
Excise ipip and stitch up the scars.
 1.7 21-Dec-2001  itojun use radix table for inbound tunnel lookup (would increase performance
for machines with a lot of tunnels).
update route cache for IPvX-over-IPv6 tunnel on path MTU discovery.
snyc with kame
 1.6 13-Nov-2001  lukem add RCSIDs
 1.5 08-May-2001  itojun branches: 1.5.2;
pull encapsulated packet for vif* via ip_encap framework.
 1.4 02-Oct-2000  itojun branches: 1.4.2; 1.4.4;
fix compilation without INET. fix confusion between ipsecstat and ipsec6stat.
sync with kame.
 1.3 05-Jul-2000  thorpej Make that note that we really should be checking the viftable
in ip_mroute.c for duplicate tunnel entries, too. Well, what
really needs to happen is that the mrouting code needs to be
changed to work w/ `gif' tunnels... but...
 1.2 05-Jul-2000  thorpej Use LIST_HEAD_INITIALIZER(), for correctness sake.
 1.1 19-Apr-2000  itojun introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.4.4.8 17-Jan-2003  thorpej Sync with HEAD.
 1.4.4.7 11-Dec-2002  thorpej Sync with HEAD.
 1.4.4.6 01-Aug-2002  nathanw Catch up to -current.
 1.4.4.5 20-Jun-2002  nathanw Catch up to -current.
 1.4.4.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.4.4.3 08-Jan-2002  nathanw Catch up to -current.
 1.4.4.2 14-Nov-2001  nathanw Catch up to -current.
 1.4.4.1 21-Jun-2001  nathanw Catch up to -current.
 1.4.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.2.1 02-Oct-2000  bouyer file ip_encap.c was added on branch thorpej_scsipi on 2000-11-20 18:10:25 +0000
 1.5.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.5.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.5.2.2 16-Mar-2002  jdolecek Catch up with -current.
 1.5.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.8.8.1 02-Aug-2002  lukem Pull up revision 1.10 (requested by itojun in ticket #593):
remove packed attribute as it will cause data be unaligned
 1.8.6.2 29-Aug-2002  gehenna catch up with -current.
 1.8.6.1 20-Jun-2002  gehenna catch up with -current.
 1.13.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.13.2.6 04-Feb-2005  skrll Sync with HEAD.
 1.13.2.5 24-Jan-2005  skrll Sync with HEAD.
 1.13.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.13.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.13.2.1 03-Aug-2004  skrll Sync with HEAD
 1.17.4.1 06-May-2005  riz Pull up revision 1.20 (requested by bouyer in ticket #1188):
get zero-cleared field on malloc. kame-pr-856
 1.19.4.1 29-Apr-2005  kent sync with -current
 1.21.2.1 12-Feb-2005  yamt sync with head.
 1.26.2.3 03-Sep-2007  yamt sync with head.
 1.26.2.2 26-Feb-2007  yamt sync with head.
 1.26.2.1 21-Jun-2006  yamt sync with head.
 1.27.14.1 19-Jun-2006  chap Sync with head.
 1.27.8.1 26-Jun-2006  yamt sync with head.
 1.27.6.1 01-Jun-2006  kardel Sync with head.
 1.27.4.2 09-Sep-2006  rpaulo sync with head
 1.27.4.1 05-Feb-2006  rpaulo <netinet6/in6_pcb.h> went away. Bye!
 1.28.12.2 12-Mar-2007  rmind Sync with HEAD.
 1.28.12.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.30.4.1 11-Jul-2007  mjf Sync with head.
 1.30.2.1 15-Jul-2007  ad Sync with head.
 1.31.30.1 18-May-2008  yamt sync with head.
 1.31.28.2 17-Jan-2009  mjf Sync with HEAD.
 1.31.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.32.10.2 28-Apr-2009  skrll Sync with HEAD.
 1.32.10.1 19-Jan-2009  skrll Sync with HEAD.
 1.32.8.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.32.2.2 20-Jun-2009  yamt sync with head
 1.32.2.1 04-May-2009  yamt sync with head.
 1.33.4.2 23-Jul-2009  jym Sync with HEAD.
 1.33.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.39.30.8 28-Aug-2017  skrll Sync with HEAD
 1.39.30.7 05-Feb-2017  skrll Sync with HEAD
 1.39.30.6 09-Jul-2016  skrll Sync with HEAD
 1.39.30.5 29-May-2016  skrll Sync with HEAD
 1.39.30.4 19-Mar-2016  skrll Sync with HEAD
 1.39.30.3 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.39.30.2 22-Sep-2015  skrll Sync with HEAD
 1.39.30.1 06-Jun-2015  skrll Sync with HEAD
 1.39.12.1 03-Dec-2017  jdolecek update from HEAD
 1.61.2.2 26-Apr-2017  pgoyette Sync with HEAD
 1.61.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.62.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.65.2.5 23-Jan-2020  martin Pull up following revision(s) (requested by knakahara in ticket #1489):

sys/netinet/ip_encap.c: revision 1.72

Fix PR security/54881. Pointed out by ohishi@IIJ, thanks.
 1.65.2.4 29-May-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1271):

sys/netinet/ip_encap.c: revision 1.71

Fix build failure when INET6 is disabled. Pointed out by ozaki-r@n.o, thanks.
 1.65.2.3 13-Jul-2018  martin Pull up following revision(s) via patch (requested by knakahara in ticket #905):

sys/netinet/ip_mroute.c: revision 1.160
sys/netinet6/in6_l2tp.c: revision 1.16
sys/net/if.h: revision 1.263
sys/netinet/in_l2tp.c: revision 1.15
sys/netinet/ip_icmp.c: revision 1.172
sys/netinet/igmp.c: revision 1.68
sys/netinet/ip_encap.c: revision 1.69
sys/netinet6/ip6_mroute.c: revision 1.129

sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.

However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.65.2.2 30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #661):

sys/netinet/ip_encap.c: revision 1.67

Fix memory leak, found by Mootja.
 1.65.2.1 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.67.2.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.67.2.2 25-Jun-2018  pgoyette Sync with HEAD
 1.67.2.1 02-May-2018  pgoyette Synch with HEAD
 1.69.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.69.2.1 10-Jun-2019  christos Sync with HEAD
 1.71.4.1 25-Jan-2020  ad Sync with head.
 1.71.2.1 23-Jan-2020  martin Pull up following revision(s) (requested by knakahara in ticket #644):

sys/netinet/ip_encap.c: revision 1.72

Fix PR security/54881. Pointed out by ohishi@IIJ, thanks.
 1.77.8.1 02-Aug-2025  perseant Sync with HEAD
 1.28 07-Dec-2022  knakahara Refactor ip_encap.[ch]

- remove encap_attach() which is no longer used
- remove USE_RADIX code in ip_encap.c, which is used for
encap_attach() only
- remove mask members in encaptab
 1.27 07-Dec-2022  knakahara Implement encap_attach_addr() which is used by IP-encaped tunnels.

The tunnels attached by encap_attach() can process receiving packets
fastly as the softc is searched by radix-tree. However, the tunnels
cannot use priority function which decides tunnel's softc by not only
source and destination but also other informations.
On the other hand, the tunnels attached by encap_attach_func() can
use priority function. However, the tunnels can be slow receiving
processing as the softc is searched by linear search (and uses each
priority function).

encap_attach_addr() can be used for tunnels which is fixed tunnel
source address and tunnel destination address. The tunnels attached
by encap_attach_addr() is searched by thmap(9), so the receiving processing
can be fast. Moreover, the tunnels can use priority function.
 1.26 07-Dec-2022  knakahara refactor: use typedef for ip_encap priority function
 1.25 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.24 28-Feb-2018  maxv branches: 1.24.2; 1.24.4;
Remove unused mbuf tags.
 1.23 15-Nov-2017  knakahara Add argument to encapsw->pr_input() instead of m_tag.
 1.22 04-Jul-2016  knakahara branches: 1.22.10;
make encap_lock_{enter,exit} interruptable.
 1.21 04-Jul-2016  knakahara use pserialize(9) and psref(9) (1/2) : without ip_encap radix tree care
 1.20 04-Jul-2016  knakahara let gif(4) promise softint(9) contract (2/2) : ip_encap side

The last commit does not care encaptab. This commit fixes encaptab race which
is used not only gif(4).
 1.19 29-Feb-2016  knakahara remove unnecessary declarations and fix KNF

Thanks to riastradh@
 1.18 26-Feb-2016  knakahara To eliminate gif_softc_list linear search, add extra argument to encapsw.pr_ctlinput().
 1.17 26-Jan-2016  knakahara eliminate variable argument in encapsw
 1.16 26-Jan-2016  knakahara implement encapsw instead of protosw and uniform prototype.

suggested and advised by riastradh@n.o, thanks.

BTW, It seems in_stf_input() had bugs...
 1.15 22-Jan-2016  riastradh Back out previous change to introduce struct encapsw.

This change was intended, but Nakahara-san had already made a better
one locally! So I'll let him commit that one, and I'll try not to
step on anyone's toes again.
 1.14 22-Jan-2016  riastradh Don't abuse struct protosw for ip_encap -- introduce struct encapsw.

Mostly mechanical change to replace it, culling some now-needless
boilerplate around all the users.

This does not substantively change the ip_encap API or eliminate
abuse of sketchy pointer casts -- that will come later, and will be
easier now that it is not tangled up with struct protosw.
 1.13 25-Nov-2008  pooka branches: 1.13.26; 1.13.44;
Make dom_maxrtkey of inet/inet6domain the size of the ip_encap pack
structures. This is far from optimal, but gets rid of iffy
#ifdef INET in radix.c. The radix bonsai still needs lots of love
before loading domains dynamically is possible...
 1.12 24-Apr-2008  ad branches: 1.12.2; 1.12.8; 1.12.10;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.11 17-Feb-2007  dyoung branches: 1.11.38; 1.11.40;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.10 10-Dec-2005  elad branches: 1.10.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.9 06-Jun-2005  martin branches: 1.9.2;
Since we decided "const struct mbuf *" would not do the right thing (tm),
remove ~all const from mbuf pointers.
 1.8 03-Jun-2005  martin Sprinkle some const
 1.7 02-Jun-2005  tron Change the first argument of the encapsulation check function from
"const struct mbuf *" to "struct mbuf *". Without this change the
actual implementation cannot even use m_copydata() on the mbuf chain
which is broken.
 1.6 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.5 17-Aug-2004  itojun branches: 1.5.4; 1.5.6;
initialize max_keylen for ip_encap.c earlier
 1.4 18-Apr-2004  matt De __P()
 1.3 17-Jan-2003  itojun branches: 1.3.2;
switch from kame-based m_aux mbuf auxiliary data, to openbsd m_tag
implementation. it will simplify porting across *bsd (such as kame/altq),
and make us more synchronized. from Joel Wilsson
 1.2 21-Dec-2001  itojun use radix table for inbound tunnel lookup (would increase performance
for machines with a lot of tunnels).
update route cache for IPvX-over-IPv6 tunnel on path MTU discovery.
snyc with kame
 1.1 19-Apr-2000  itojun branches: 1.1.6; 1.1.8; 1.1.10;
introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.1.10.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.1.8.2 17-Jan-2003  thorpej Sync with HEAD.
 1.1.8.1 08-Jan-2002  nathanw Catch up to -current.
 1.1.6.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.1.6.1 19-Apr-2000  bouyer file ip_encap.h was added on branch thorpej_scsipi on 2000-11-20 18:10:25 +0000
 1.3.2.7 11-Dec-2005  christos Sync with head.
 1.3.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.3.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.3.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.3.2.2 25-Aug-2004  skrll Sync with HEAD.
 1.3.2.1 03-Aug-2004  skrll Sync with HEAD
 1.5.6.1 12-Feb-2005  yamt sync with head.
 1.5.4.1 29-Apr-2005  kent sync with -current
 1.9.2.2 26-Feb-2007  yamt sync with head.
 1.9.2.1 21-Jun-2006  yamt sync with head.
 1.10.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.11.40.1 18-May-2008  yamt sync with head.
 1.11.38.2 17-Jan-2009  mjf Sync with HEAD.
 1.11.38.1 02-Jun-2008  mjf Sync with HEAD.
 1.12.10.1 19-Jan-2009  skrll Sync with HEAD.
 1.12.8.1 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.12.2.1 04-May-2009  yamt sync with head.
 1.13.44.2 09-Jul-2016  skrll Sync with HEAD
 1.13.44.1 19-Mar-2016  skrll Sync with HEAD
 1.13.26.1 03-Dec-2017  jdolecek update from HEAD
 1.22.10.1 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.24.4.1 10-Jun-2019  christos Sync with HEAD
 1.24.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.22 14-Aug-2018  maxv Retire EtherIP, we have L2TP instead.
 1.21 26-Jan-2018  maxv branches: 1.21.2; 1.21.4;
A few fixes:

* Style.

* Don't add M_PKTHDR manually, that's absolutely forbidden. Add a
KASSERT to make sure it's already there.

* Add a missing NULL check after m_pullup.
 1.20 11-Jan-2017  ozaki-r branches: 1.20.8;
Get rid of unnecessary header inclusions
 1.19 15-Dec-2016  ozaki-r Move bpf_mtap and if_ipackets++ on Rx of each driver to percpuq if_input

The benefits of the change are:
- We can reduce codes
- We can provide the same behavior between drivers
- Where/When if_ipackets is counted up
- Note that some drivers still update packet statistics in their own
way (periodical update)
- Moved bpf_mtap run in softint
- This makes it easy to MP-ify bpf

Proposed on tech-kern and tech-net
 1.18 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.17 10-Jun-2016  ozaki-r branches: 1.17.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.16 09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.15 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.14 17-Jul-2011  joerg branches: 1.14.12; 1.14.30;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.13 05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.12 19-Jan-2010  pooka branches: 1.12.2; 1.12.4;
Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.11 19-Oct-2008  hans if_input needs to be called at splnet(). ok by cube.
 1.10 16-Oct-2008  hans include bpf headers so that the bpf calls actually do something. ok by cube.
 1.9 12-Apr-2008  thorpej branches: 1.9.4; 1.9.10;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.8 07-Apr-2008  thorpej Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.
 1.7 20-Dec-2007  dyoung branches: 1.7.6;
Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.6 11-Dec-2007  lukem use __KERNEL_RCSID()
 1.5 02-May-2007  dyoung branches: 1.5.8; 1.5.16; 1.5.18; 1.5.20;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.4 17-Feb-2007  dyoung branches: 1.4.4; 1.4.6;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.3 15-Dec-2006  joerg branches: 1.3.2; 1.3.4; 1.3.6;
Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.2 06-Dec-2006  jdc branches: 1.2.2;
Explicitly include <sys/device.h>, which we need for `struct device'.
This allows us to compile on !i386. (On i386, <machine/cpu.h> pulled
in <sys/device.h> for us, thus hiding the compilation problem.)

OK by rpaulo@.
 1.1 23-Nov-2006  rpaulo branches: 1.1.2;
New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.1.2.1 09-Dec-2006  bouyer Pull up following revision(s) (requested by jdc in ticket #259):
sys/netinet6/ip6_etherip.c: revision 1.2
sys/netinet/ip_etherip.c: revision 1.2
Explicitly include <sys/device.h>, which we need for `struct device'.
This allows us to compile on !i386. (On i386, <machine/cpu.h> pulled
in <sys/device.h> for us, thus hiding the compilation problem.)
OK by rpaulo@.
 1.2.2.3 18-Dec-2006  yamt sync with head.
 1.2.2.2 10-Dec-2006  yamt sync with head.
 1.2.2.1 06-Dec-2006  yamt file ip_etherip.c was added on branch yamt-splraiseipl on 2006-12-10 07:19:10 +0000
 1.3.6.2 07-May-2007  yamt sync with head.
 1.3.6.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.3.4.2 12-Jan-2007  ad Sync with head.
 1.3.4.1 15-Dec-2006  ad file ip_etherip.c was added on branch newlock2 on 2007-01-12 01:04:14 +0000
 1.3.2.5 21-Jan-2008  yamt sync with head
 1.3.2.4 03-Sep-2007  yamt sync with head.
 1.3.2.3 26-Feb-2007  yamt sync with head.
 1.3.2.2 30-Dec-2006  yamt sync with head.
 1.3.2.1 15-Dec-2006  yamt file ip_etherip.c was added on branch yamt-lazymbuf on 2006-12-30 20:50:33 +0000
 1.4.6.1 11-Jul-2007  mjf Sync with head.
 1.4.4.1 08-Jun-2007  ad Sync with head.
 1.5.20.2 02-Jan-2008  bouyer Sync with HEAD
 1.5.20.1 13-Dec-2007  bouyer Sync with HEAD
 1.5.18.1 11-Dec-2007  yamt sync with head.
 1.5.16.1 26-Dec-2007  ad Sync with head.
 1.5.8.1 09-Jan-2008  matt sync with HEAD
 1.7.6.2 17-Jan-2009  mjf Sync with HEAD.
 1.7.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.9.10.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.9.10.1 19-Oct-2008  haad Sync with HEAD.
 1.9.4.3 11-Aug-2010  yamt sync with head.
 1.9.4.2 11-Mar-2010  yamt sync with head
 1.9.4.1 04-May-2009  yamt sync with head.
 1.12.4.1 30-May-2010  rmind sync with head
 1.12.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.14.30.4 05-Feb-2017  skrll Sync with HEAD
 1.14.30.3 09-Jul-2016  skrll Sync with HEAD
 1.14.30.2 19-Mar-2016  skrll Sync with HEAD
 1.14.30.1 22-Sep-2015  skrll Sync with HEAD
 1.14.12.1 03-Dec-2017  jdolecek update from HEAD
 1.17.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.17.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.20.8.1 05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #694):

sys/netinet6/ip6_etherip.c: revision 1.22
sys/net/if_etherip.c: revision 1.41
sys/net/if_etherip.c: revision 1.42
sys/netinet/ip_etherip.c: revision 1.21

Don't call if_attach, do if_initialize+if_register, otherwise when an
EtherIP packet is received the first KASSERT in if_input() fires.

A few fixes:
* Style.
* Don't add M_PKTHDR manually, that's absolutely forbidden. Add a
KASSERT to make sure it's already there.
* Add a missing NULL check after m_pullup.
 1.21.4.1 10-Jun-2019  christos Sync with HEAD
 1.21.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.2 14-Aug-2018  maxv Retire EtherIP, we have L2TP instead.
 1.1 23-Nov-2006  rpaulo branches: 1.1.4; 1.1.6; 1.1.8; 1.1.146; 1.1.148;
New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.1.148.1 10-Jun-2019  christos Sync with HEAD
 1.1.146.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.1.8.2 12-Jan-2007  ad Sync with head.
 1.1.8.1 23-Nov-2006  ad file ip_etherip.h was added on branch newlock2 on 2007-01-12 01:04:14 +0000
 1.1.6.2 30-Dec-2006  yamt sync with head.
 1.1.6.1 23-Nov-2006  yamt file ip_etherip.h was added on branch yamt-lazymbuf on 2006-12-30 20:50:33 +0000
 1.1.4.2 10-Dec-2006  yamt sync with head.
 1.1.4.1 23-Nov-2006  yamt file ip_etherip.h was added on branch yamt-splraiseipl on 2006-12-10 07:19:10 +0000
 1.96 28-Mar-2004  martti Upgraded IPFilter to 4.1.1
 1.95 22-Aug-2003  itojun correct missing inclusion of opt_ipsec.h
 1.94 15-Aug-2003  martti Fix return-rst for IPv6 (PR#22157 by Peter Postma).
 1.93 30-Jun-2003  itojun branches: 1.93.2;
remove IPv4 hook if IPv6 hook fails (seems to be cut-and-paste error).
 1.92 29-Jun-2003  fvdl Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.91 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.90 26-Jun-2003  itojun tabify
 1.89 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.88 11-Dec-2002  atatat Always recompute the IP checksum, otherwise fast-routed packets that
also get natted leave with an invalid checksum which can prevent
things from working properly.
 1.87 23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.86 19-Sep-2002  martti Resync with official IPF
 1.85 19-Sep-2002  martti Upgraded IPFilter to 3.4.29
 1.84 07-Sep-2002  enami Make usr.sbin/ipf/ipftest compiles again.
 1.83 06-Sep-2002  gehenna The device switch ``ipl_cdevsw'' is defined after 1.6H.
 1.82 06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.81 09-Jun-2002  itojun whitespace
 1.80 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.79 02-May-2002  martti branches: 1.79.2; 1.79.4;
Fix compilation problems
 1.78 02-May-2002  martti Upgraded IPFilter to 3.4.27
 1.77 14-Mar-2002  martti Added (char *) for pointer arithmetic
 1.76 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.75 24-Jan-2002  martti Fixed initialization
 1.74 24-Jan-2002  martti Re-sync with IPFilter
 1.73 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.72 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.71 13-Nov-2001  lukem add RCSIDs
 1.70 18-Oct-2001  thorpej Deprecate the "m_act" alias of "m_nextpkt" (m_act is a historical
name), and just use m_nextpkt everywhere.
 1.69 17-Sep-2001  thorpej Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.
 1.68 02-Jun-2001  thorpej branches: 1.68.2; 1.68.4;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.67 12-May-2001  christos - Handle realloc failure without leaking memory
(reported by: grendel@heorot.stanford.edu (Ted U)
- Don't cast malloc/realloc/calloc return values because they hide LP64 bugs.
- Don't destroy the whole array when realloc fails
- Use calloc in all cases (malloc was used inconsistently).
- Avoid duplicating code.

Reviewed by: ross
 1.66 26-Mar-2001  mike Resolve conflicts.
 1.65 05-Feb-2001  chs branches: 1.65.2;
expose the definitions of MIN() and MAX() in sys/param.h to the kernel
and use those in favor of a dozen copies scattered around the source tree.
 1.64 28-Dec-2000  thorpej Fix a small typo that would cause IP Filter to not hook in to
pfil_hooks properly on kernels that included IPv6 support.
 1.63 28-Dec-2000  thorpej Back out the sledgehammer damage applied by wiz while I was out for
the holiday.
 1.62 25-Dec-2000  wiz Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.
 1.61 22-Dec-2000  thorpej Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.
 1.60 11-Nov-2000  thorpej Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.
 1.59 22-Aug-2000  sommerfeld Fill in next mtu field of NEEDFRAG ICMP error message.
From Marc Horowitz, pr10857
 1.58 09-Aug-2000  veego Resolve conflicts.
 1.57 01-Aug-2000  thorpej Slight adjustment to last, to allow the userland version to build.
 1.56 01-Aug-2000  thorpej - ipl_enable(): -1 is not an error return. If initializing IP Filter
fails, return EIO instead.

- iplioctl(): If performing a NAT operation, and IP Filter is not
yet initialized (e.g. by `ipf -E'), enable it implicitly before
doing the NAT operation.
 1.55 12-Jun-2000  veego branches: 1.55.2;
Remove a duplicated check for the NetBSD callout (I think it is a mistake
from a previous conflict resolve which doesn't cause harm).
 1.54 12-Jun-2000  veego Resolve conflicts.
 1.53 23-May-2000  veego branches: 1.53.2;
Resolve conflicts.
 1.52 21-May-2000  veego Add a missing ; at the end of a line.
 1.51 21-May-2000  veego Resolve conflicts.
 1.50 11-May-2000  veego Resolve conflicts and fix a compile error in ip_ftp_pxy.c.
 1.49 03-May-2000  veego Resolve conflicts.
 1.48 16-Apr-2000  chs remove ifdefs to skip htons() on some big-endian platforms.
 1.47 30-Mar-2000  augustss Remove register declarations.
 1.46 24-Mar-2000  thorpej Pull in <sys/callout.h> for the benefit of userland.
 1.45 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.44 07-Mar-2000  mycroft Fix a splx() botch or two.
 1.43 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.42 20-Feb-2000  darrenr pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".
 1.41 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.40 01-Feb-2000  veego Only print one 'IP Filter:' line when it gets enabled or disabled.
 1.39 01-Feb-2000  veego Resolve conflicts.
 1.38 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.37 12-Oct-1999  sommerfeld branches: 1.37.2;
in ipfr_fastroute, before calling icmp_error(), put received-interface
back into the packet. (ip_output() clears it since ipsec reuses that
packet field in the output path. by putting it back, we're going to
pretend we're back on the input path now).
 1.36 26-Aug-1999  itojun branches: 1.36.2;
clear m->m_pkthdr.rcvif before calling ip_output().
the member is used to pass struct socket to ip{,6}_output for ipsec decisions.

(i agree it is kind of ugly. we need to modify struct mbuf if we are
to do better - which seems to me a bit too much)
 1.35 26-Aug-1999  marc when fastrouting a packet which needs fragmentation, the packet passed
to if_output did not have m->m_pkthdr.len set correctly. Add the code
to do this from the similar code in ip_output.c
 1.34 02-Feb-1999  cjs branches: 1.34.2; 1.34.6;
Remove SCCS markers and make these compile in $NetBSD$ IDs.
 1.33 19-Jan-1999  mycroft There's just no plausible reason to byte-swap ip_id internally. It's opaque.
 1.32 25-Nov-1998  sommerfe Fragments should start with a header mbuf allocated by MGETHDR()
 1.31 22-Nov-1998  mrg merge ipf 3.2.10
 1.30 15-Nov-1998  drochner fix the previous: "securelevel" in kernel only
 1.29 14-Nov-1998  tls In 'highly secure' mode (securelevel >= 2), the filter lists may not be tampered with. It might be desirable to allow enabling of preset filter lists, but it seems too good a candidate for a denial-of-service attack, so we don't.
 1.28 17-Jul-1998  sommerfe Fix PR5559: if fast-forwarding, DF set, and packet too large, send ICMP error..
 1.27 17-May-1998  veego Resolve conflicts
 1.26 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.25 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.24 21-Sep-1997  veego branches: 1.24.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.23 06-Jul-1997  thorpej branches: 1.23.2;
The fingerprint of (*fr_checkp)() is the same if compiling in kernel
or user code.
 1.22 06-Jul-1997  thorpej Restore original RCS IDs.
 1.21 06-Jul-1997  thorpej - Add a prototype for fixskip() so that this file compiles.
- Fix, ONCE AGAIN, semantics of ipfilterattach(). This time, not only
was it semantically broken, it wasn't even close to compiling!
 1.20 05-Jul-1997  darrenr fix conflicts from import
 1.19 01-Jun-1997  thorpej In ipl_disable(), don't conditionalize the "fr_checkp = fr_savep"
operation, since:
- in ipl_enable(), "fr_savep = fr_checkp" is not conditionalized
in the same way (not at all), and
- without this change, it was not possible to enable, disable,
and reenable ipfilter.
 1.18 28-May-1997  thorpej Put the #ifndef _KERNEL prototype of get_unit() in <netinet/ip_fil.h>
since it is needed by other files, in order to compile on 64-bit
architectures.
 1.17 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.16 27-May-1997  thorpej Make this compile on 32-bit architectures again:
- Fix a really obvious error: ipl_enable() disappeared, but the guts of
the function were scrunched into the "no-op" BSD pseudo-device attach
routine. Would not compile, because of non-void return from a void
function. Fixed by reincarnating ipl_enable(), and reimplementing
the no-op pseudo-device attach.
- #ifdef as appropriate to remove unused variable warnings.
- Call ipl_enable() in iplinit(), rather than the no-op ipfilterattach().
 1.15 26-May-1997  darrenr remove extra #endif
 1.14 25-May-1997  darrenr fix conflicts
 1.13 15-Apr-1997  christos - Fix indentation of the nested conditionals. It was inconsistent in places.
- Make this compile and work without IPFILTER_LOG, and disable logging by
default. This can be re-enabled now as a kernel option.
 1.12 04-Apr-1997  cgd include <stdlib.h> if !_KERNEL for malloc declaration/proto
 1.11 03-Apr-1997  cgd fix ... potentially fatal typo (s/unix/unit/)
 1.10 01-Apr-1997  augustss Make it compile again by removing a cast to void of KFREE(). KFREE expands
to a statement, not an expression.
 1.9 29-Mar-1997  darrenr use IPLLOG instead of ipllog to easily mask parameters, fix up prototype
problems for compiling to user programs.
 1.8 29-Mar-1997  thorpej Centralize the check for NetBSD PFIL_HOOKS code into ip_fil.h, and use
it consistently.
 1.7 29-Mar-1997  thorpej Fix an ... interesting bug that resulted from namespace collision.
Description:

- A BSD pseudo-device initialization routine is declared as
void <pseudo-device name>attach __P((int count));
in ioconf.c by config(8). main() calls these functions
from a table.

- IP Filter has functions iplattach() and ipldetach() (or,
in the NetBSD case, were erroneously renamed ipfilterattach()
and ipfilterdetach()). These functions are used to establish
and disestablish the IP Filter "filter rule check" hook in
the IP input/output stream. They are declared:
int iplattach __P((void));
int ipldetach __P((void));
..and are expected to return a value by iplioctl().

- When main() calls (by sheer coincidence!) iplattach(),
the filter hook is established, and the IP Filter machinery
labeled as "initialized". This causes all packets, whether or
not the user intents to use filter rules, to be passed to
the filter rule checker if "ipfilter" is configured into the
kernel.

- As a result of the above, a kludge existed to default to
passing all packets (I can only assume that when this was
originally committed, the symptom of the bug was noticed by
the integrator, but the bug not actually found/fixed).

- In iplioctl(), if the SIOCFRENB ioctl is issued with an
argument of "enable" (i.e. user executed "ipf -E"), iplattach()
will notice that the machinery is already initialized and
return EBUSY.

Fix:

- Rename iplattach()/ipldetach() to ipl_enable() and ipl_disable().

- Create a pseudo-device entry stub named ipfilterattach()
(NetBSD case) or iplattach() (all other). This is a noop; none
of the machinery should be initialized until the caller expicitly
enables the filter with ipf -E. Add a comment to note that.
 1.6 29-Mar-1997  thorpej Resolve conflicts from merge.

XXX !!! XXX !!!
I noticed a few semi-serious bugs while doing this merge, one of which
has existed for a fairly long time. Some of them are addressed in this
commit (because they caused the kernel to not compile), and are annoted
by "XXX" and "--thorpej". The other one will be addressed shortly in
a future commit, and, as far as I can tell, affects all operating systems
which IP Filter supports.
 1.5 18-Mar-1997  cgd ioctl commands are u_longs
 1.4 08-Jan-1997  veego ipl[attach|detach]->ipfilter[attach|detach] for the pseudo-device change
 1.3 07-Jan-1997  mrg remove some old debugging statements.
 1.2 05-Jan-1997  veego Add $NetBSD$ id's and restore the orginal Id's.
 1.1 05-Jan-1997  mrg branches: 1.1.1;
initial import of darren reed's ip-filter, version 3.1.2.
 1.1.1.24 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.23 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.22 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.21 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.20 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.19 09-Aug-2000  veego Import IP Filter 3.4.9
 1.1.1.18 12-Jun-2000  veego Import IP Filter 3.4.6
 1.1.1.17 23-May-2000  veego Import IP Filter 3.4.4
 1.1.1.16 21-May-2000  veego Import IP Filter 3.4.3
 1.1.1.15 11-May-2000  veego Import IP Filter 3.4.2
 1.1.1.14 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.13 01-Feb-2000  veego Import IP Filter 3.3.8
 1.1.1.12 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.11 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.10 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.9 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.8 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.7 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.6 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.5 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.4 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.3 25-May-1997  darrenr Import version 3.2alpha7
 1.1.1.2 27-Mar-1997  darrenr Bring in entire 3.2alpha2 source tree
 1.1.1.1 27-Mar-1997  darrenr Update to version 3.2alpha2
 1.23.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.24.2.5 29-Nov-1998  cgd pull up rev 1.32 from trunk (mrg)
 1.24.2.4 24-Nov-1998  cgd pull up rev(s) 1.31 from trunk (ipfilter 3.2.10). (mrg)
 1.24.2.3 22-Jul-1998  mellon Pull up 1.28 (veego)
 1.24.2.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.24.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.34.6.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.34.6.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.34.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.34.2.5 14-Dec-2000  he Apply patch (requested by he):
Fix problem causing only first ipnat rule to be loaded.
Fixes PR#11569.
 1.34.2.4 09-Aug-2000  thorpej Pull up rev. 1.57, as was done on netbsd-1-5 branch, to
fix a compilation problem in userland.
 1.34.2.3 02-Aug-2000  he Pull up revision 1.56 (via patch, requested by thorpej):
Properly report initialization error. Enable IPF automatically
if it wasn't already enabled before doing a NAT operation.
 1.34.2.2 20-Dec-1999  he Pull up patch (requested by he):
Make this compile on the netbsd-1-4 branch.
 1.34.2.1 20-Dec-1999  he Pull up revision 1.35 (requested by darrenr):
Update IPF to version 3.3.5.
 1.36.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.37.2.5 27-Mar-2001  bouyer Sync with HEAD.
 1.37.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.37.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.37.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.37.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.53.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.55.2.6 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.55.2.5 09-Feb-2002  he Pull up revisions 1.59-1.75 (via patch, requested by martti):
Updated IPFilter to 3.4.23.
 1.55.2.4 31-Aug-2000  veego Pull up ipf 3.4.9 (requested by veego). approved by releng-1-5.

basesrc/dist/ipf/HISTORY 1.8 -> 1.9
basesrc/dist/ipf/fils.c 1.9 -> 1.10
basesrc/dist/ipf/ip_sfil.c 1.5 -> 1.6
basesrc/dist/ipf/ipf.c 1.4 -> 1.5
basesrc/dist/ipf/ipmon.c 1.4 -> 1.5
basesrc/dist/ipf/ipnat.c 1.5 -> 1.6
basesrc/dist/ipf/natparse.c 1.3 -> 1.4
basesrc/dist/ipf/parse.c 1.4 -> 1.5
basesrc/dist/ipf/iplang/iplang_y.y 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.1 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.5 1.1 -> 1.2
syssrc/sys/netinet/fil.c 1.36 -> 1.37
syssrc/sys/netinet/ip_auth.c 1.17 -> 1.18
syssrc/sys/netinet/ip_fil.c 1.57 -> 1.58
syssrc/sys/netinet/ip_ftp_pxy.c 1.16 -> 1.17
syssrc/sys/netinet/ip_log.c 1.10 -> 1.11
syssrc/sys/netinet/ip_nat.c 1.34 -> 1.35
syssrc/sys/netinet/ip_nat.h 1.20 -> 1.21
syssrc/sys/netinet/ip_rcmd_pxy.c 1.4 -> 1.5
syssrc/sys/netinet/ip_state.c 1.26 -> 1.27
syssrc/sys/netinet/ip_state.h 1.16 -> 1.17
syssrc/sys/netinet/ipl.h 1.8 -> 1.9

Changes:
>3.4.9 08/08/2000 - Released
>
>implement new aging mechanism in fr_tcp_age()
>
>fix icmp state checking bug
>
>revamp buildsunos script and build both sparcv7/sparcv9 for Solaris
>if on an Ultra with a 64bit system & compiler (Caseper Dik)
>
>open ipfilter device read only if we know we can
>
>print out better information for ICMP packets in ipmon
>
>move checking for source spoofed packets to a point where we can generate
>logs of them
>
>return EFAULT from ircopyptr/iwcopyptr
>
>don't do ioctl(SIOCGETFS) for auth stats
>
>fix up freeing mbufs for post-4.3BSD
>
>fix returning of inc from ftp proxy
>
>fix bugs with ipfs -R/-W (Caseper Dik)
>
>3.4.8 19/07/2000 - Released
>
>create fake opt_inet6.h for FreeBSD-4 compile as LKM
>
>add #ifdef's for KLD_MODULE sanity
>
>NAT fastroute'd packets which come out of return-*
>
>fix upper/lower case crap in ftp proxy and get seq# checking fixed up.
>
>3.4.7 08/07/2000 - Released
>
>make "ipf -y" lookup NAT if's which are unknown
>
>prepend line numbers to ioctl error messages in ipf/ipnat
>
>don't apply patches to FreeBSD twice
>
>allow for ip_len to be on an unaligned boundary early on in fr_precheck
>
>fix printing of icmp code when it is 0
>
>correct printing of port numbers in map rules with from/to
>
>don't allow fr_func to be called at securelevel > 0 or rules to be added
>if securelevel > 0 if they have a non-zero fr_func.
 1.55.2.3 31-Aug-2000  sommerfeld Fill in the next mtu field of a NEEDFRAG ICMP error message.
From marc horowitz, pr10857. approved by *someone* on releng-1-5 a while back
 1.55.2.2 07-Aug-2000  thorpej Pull up rev. 1.57, to fix userland compilation problem,
as noted by Matthias Drochner.
 1.55.2.1 06-Aug-2000  thorpej Pull up rev. 1.56:
- ipl_enable(): -1 is not an error return. If initializing IP Filter
fails, return EIO instead.

- iplioctl(): If performing a NAT operation, and IP Filter is not
yet initialized (e.g. by `ipf -E'), enable it implicitly before
doing the NAT operation.
 1.65.2.14 11-Dec-2002  thorpej Sync with HEAD.
 1.65.2.13 11-Nov-2002  nathanw Catch up to -current
 1.65.2.12 20-Sep-2002  thorpej Sync with HEAD.
 1.65.2.11 17-Sep-2002  nathanw Catch up to -current.
 1.65.2.10 20-Jun-2002  nathanw Catch up to -current.
 1.65.2.9 04-May-2002  thorpej Update from trunk.
 1.65.2.8 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.65.2.7 28-Feb-2002  nathanw Catch up to -current.
 1.65.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.65.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.65.2.4 22-Oct-2001  nathanw Catch up to -current.
 1.65.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.65.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.65.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.68.4.2 01-Oct-2001  fvdl Catch up with -current.
 1.68.4.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.68.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.68.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.68.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.68.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.79.4.2 15-Aug-2003  tron Pull up revision 1.94 (requested by martti in ticket #1411):
Fix return-rst for IPv6 (PR#22157 by Peter Postma).
 1.79.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.79.2.5 29-Aug-2002  gehenna catch up with -current.
 1.79.2.4 15-Jul-2002  gehenna catch up with -current.
 1.79.2.3 20-Jun-2002  gehenna catch up with -current.
 1.79.2.2 30-May-2002  gehenna Catch up with -current.
 1.79.2.1 16-May-2002  gehenna Add the character device switch.
 1.93.2.2 03-Aug-2004  skrll Sync with HEAD
 1.93.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.57 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.56 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.55 10-May-2004  christos PR/24969: Arto Selonen: /usr/sbin/ipfs from ipfilter 4.1.1 does not work
patch applied.
 1.54 28-Mar-2004  martti branches: 1.54.2;
Upgraded IPFilter to 4.1.1
 1.53 24-Feb-2004  wiz occured -> occurred. From Peter Postma.
 1.52 29-Jun-2003  fvdl branches: 1.52.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.51 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.50 26-Jun-2003  itojun tabify
 1.49 29-Sep-2002  martti Remove unused ipl_usec.
 1.48 25-Sep-2002  martti Fix ipmon problems on 64-bit platforms (PR#17403 and PR#17404).
 1.47 19-Sep-2002  martti Resync with official IPF
 1.46 19-Sep-2002  martti Upgraded IPFilter to 3.4.29
 1.45 01-Jul-2002  christos Fix iplog problem on sparc64 [from Tomi Nylund]
1. size_t is 64 bits, so use a u_32_t for iplused
2. microtime() and friends expect a struct timeval,
passing the first of two unsigned longs will not cut it.
 1.44 02-May-2002  martti branches: 1.44.2; 1.44.4;
Upgraded IPFilter to 3.4.27
 1.43 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.42 24-Jan-2002  martti Re-sync with IPFilter
 1.41 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.40 16-Sep-2001  wiz Spell 'occurred' with two 'r's.
 1.39 26-Mar-2001  mike branches: 1.39.2; 1.39.4;
Resolve conflicts.
 1.38 11-Nov-2000  thorpej branches: 1.38.2;
Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.
 1.37 12-Jun-2000  veego branches: 1.37.2;
Resolve conflicts.
 1.36 23-May-2000  veego branches: 1.36.2;
Resolve conflicts.
 1.35 03-May-2000  veego Resolve conflicts.
 1.34 01-Feb-2000  veego Resolve conflicts.
 1.33 28-Dec-1999  darrenr update ipfilter code to 3.3.6
 1.32 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.31 11-Dec-1998  mrg branches: 1.31.2; 1.31.8; 1.31.14;
remove this insanity. appeared with ipfilter 3.2.10...
 1.30 11-Dec-1998  drochner correction to the previous: protect against _LKM too
pointed out by Todd Whitesel <toddpw@best.com>
 1.29 11-Dec-1998  drochner correcton tp previous: don't try to include kernel option headers in
userland
fixes PR kern/6561 (Takahiro Kambe)
 1.28 10-Dec-1998  christos defopt
 1.27 22-Nov-1998  mrg merge ipf 3.2.10
 1.26 12-Jul-1998  veego Resolve conflicts from the import.
 1.25 29-May-1998  veego Resolve conflicts from the import of IPFilter 3.2.7.
 1.24 17-May-1998  veego Resolve conflicts
 1.23 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.22 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.21 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.20 21-Sep-1997  veego branches: 1.20.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.19 08-Jul-1997  mrg branches: 1.19.2;
put back IPFILTER_DEFAULT_BLOCK, as documented in options(4).
 1.18 07-Jul-1997  veego Use FR_PASS for IPF_DEFAULT_PASS. This can be overwritten with an
options IPF_DEFAULT_PASS=FR_BLOCK in your config file.
 1.17 06-Jul-1997  thorpej Restore original RCS IDs.
 1.16 06-Jul-1997  thorpej - Add a missing #ifdef SOLARIS
- Properly prototype ipfilterattach()/iplattach().
 1.15 05-Jul-1997  darrenr fix conflicts from import
 1.14 28-May-1997  thorpej Put the #ifndef _KERNEL prototype of get_unit() in <netinet/ip_fil.h>
since it is needed by other files, in order to compile on 64-bit
architectures.
 1.13 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.12 27-May-1997  thorpej Make this compile on 32-bit architectures again:
- Don't prototype functions that don't exist, and do prototype those
that do.
- Get ioctl arguments right (cmd is a u_long in NetBSD).
 1.11 25-May-1997  darrenr fix conflicts
 1.10 15-Apr-1997  christos - Fix indentation of the nested conditionals. It was inconsistent in places.
- Make this compile and work without IPFILTER_LOG, and disable logging by
default. This can be re-enabled now as a kernel option.
 1.9 29-Mar-1997  thorpej Define control device names here; they're needed by kernel and userland.
 1.8 29-Mar-1997  darrenr use IPLLOG instead of ipllog to easily mask parameters, fix up prototype
problems for compiling to user programs.
 1.7 29-Mar-1997  thorpej Centralize the check for NetBSD PFIL_HOOKS code into ip_fil.h, and use
it consistently.
 1.6 29-Mar-1997  thorpej Fix an ... interesting bug that resulted from namespace collision.
Description:

- A BSD pseudo-device initialization routine is declared as
void <pseudo-device name>attach __P((int count));
in ioconf.c by config(8). main() calls these functions
from a table.

- IP Filter has functions iplattach() and ipldetach() (or,
in the NetBSD case, were erroneously renamed ipfilterattach()
and ipfilterdetach()). These functions are used to establish
and disestablish the IP Filter "filter rule check" hook in
the IP input/output stream. They are declared:
int iplattach __P((void));
int ipldetach __P((void));
..and are expected to return a value by iplioctl().

- When main() calls (by sheer coincidence!) iplattach(),
the filter hook is established, and the IP Filter machinery
labeled as "initialized". This causes all packets, whether or
not the user intents to use filter rules, to be passed to
the filter rule checker if "ipfilter" is configured into the
kernel.

- As a result of the above, a kludge existed to default to
passing all packets (I can only assume that when this was
originally committed, the symptom of the bug was noticed by
the integrator, but the bug not actually found/fixed).

- In iplioctl(), if the SIOCFRENB ioctl is issued with an
argument of "enable" (i.e. user executed "ipf -E"), iplattach()
will notice that the machinery is already initialized and
return EBUSY.

Fix:

- Rename iplattach()/ipldetach() to ipl_enable() and ipl_disable().

- Create a pseudo-device entry stub named ipfilterattach()
(NetBSD case) or iplattach() (all other). This is a noop; none
of the machinery should be initialized until the caller expicitly
enables the filter with ipf -E. Add a comment to note that.
 1.5 29-Mar-1997  thorpej Resolve conflicts from merge.

XXX !!! XXX !!!
I noticed a few semi-serious bugs while doing this merge, one of which
has existed for a fairly long time. Some of them are addressed in this
commit (because they caused the kernel to not compile), and are annoted
by "XXX" and "--thorpej". The other one will be addressed shortly in
a future commit, and, as far as I can tell, affects all operating systems
which IP Filter supports.
 1.4 19-Feb-1997  scottr Don't include ipfilter.h if building an LKM.
 1.3 18-Feb-1997  mrg pseudo-device ipfilter brings in PFIL_HOOKS.
 1.2 05-Jan-1997  veego branches: 1.2.4;
Add $NetBSD$ id's and restore the orginal Id's.
 1.1 05-Jan-1997  mrg branches: 1.1.1;
initial import of darren reed's ip-filter, version 3.1.2.
 1.1.1.25 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.24 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.23 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.22 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.21 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.20 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.19 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.18 12-Jun-2000  veego Import IP Filter 3.4.6
 1.1.1.17 23-May-2000  veego Import IP Filter 3.4.4
 1.1.1.16 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.15 01-Feb-2000  veego Import IP Filter 3.3.8
 1.1.1.14 28-Dec-1999  darrenr update DARRENR branch of netinet to 3.3.6
 1.1.1.13 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.12 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.11 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.10 29-May-1998  veego Import IP Filter 3.2.7
 1.1.1.9 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.8 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.7 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.6 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.5 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.4 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.3 25-May-1997  darrenr Import version 3.2alpha7
 1.1.1.2 27-Mar-1997  darrenr Bring in entire 3.2alpha2 source tree
 1.1.1.1 27-Mar-1997  darrenr Update to version 3.2alpha2
 1.2.4.1 12-Mar-1997  is Merge in changes from Trunk
 1.19.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.20.2.5 12-Dec-1998  cgd patch to keep opt_pfil_hooks.h from being required. it's not created on
1.3.x. (mrg)
 1.20.2.4 24-Nov-1998  cgd pull up rev(s) 1.27 from trunk (ipfilter 3.2.10). (mrg)
 1.20.2.3 22-Jul-1998  mellon Pull up 1.26 (veego)
 1.20.2.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.20.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.31.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.31.8.3 27-Mar-2001  bouyer Sync with HEAD.
 1.31.8.2 22-Nov-2000  bouyer Sync with HEAD.
 1.31.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.31.2.2 08-Jan-2000  he Pull up revision 1.33 (requested by darrenr):
Update IPF to version 3.3.6.
 1.31.2.1 20-Dec-1999  he Pull up revision 1.32 (requested by darrenr):
Update IPF to version 3.3.5.
 1.36.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.37.2.2 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.37.2.1 09-Feb-2002  he Pull up revisions 1.38-1.42 (requested by martti):
Updated IPFilter to 3.4.23
 1.38.2.8 18-Oct-2002  nathanw Catch up to -current.
 1.38.2.7 20-Sep-2002  thorpej Sync with HEAD.
 1.38.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.38.2.5 04-May-2002  thorpej Update from trunk.
 1.38.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.38.2.3 28-Feb-2002  nathanw Catch up to -current.
 1.38.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.38.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.39.4.2 01-Oct-2001  fvdl Catch up with -current.
 1.39.4.1 07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.39.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.39.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.39.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.39.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.39.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.44.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.44.2.1 15-Jul-2002  gehenna catch up with -current.
 1.52.2.5 19-Oct-2004  skrll Sync with HEAD
 1.52.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.52.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.52.2.2 03-Aug-2004  skrll Sync with HEAD
 1.52.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.54.2.2 13-Aug-2004  jmc branches: 1.54.2.2.2;
Pullup rev 1.56 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.54.2.1 30-May-2004  tron Pull up revision 1.55 (requested by christos in ticket #416):
PR/24969: Arto Selonen: /usr/sbin/ipfs from ipfilter 4.1.1 does not work
patch applied.
 1.54.2.2.2.1 06-Feb-2005  jmc Pull up revision 1.57 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.17 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.16 06-Sep-2004  yamt fr_check_wrapper: as ipf modifies application data as well when
doing application proxy, it's needed to ensure that the whole packet
is writable here.
 1.15 06-Sep-2004  yamt fr_check_wrapper, fr_check_wrapper6:
ensure that mbufs are writable beforehand as ipf assumes.
PR/26773 and PR/26850.
 1.14 04-Aug-2004  christos remove the avail = 0; assignment which is superfluous. pointed out by enami.
 1.13 03-Aug-2004  christos PR/26471: Arto Selonen: ipfilter 4.1.3 crashes the system every few hours
Remove extraneous m = NULL assignment that will cause a NULL dereference
later.
 1.12 23-Jul-2004  martti branches: 1.12.2;
Upgraded IPFilter to 4.1.3
 1.11 16-Jun-2004  tron Don't leak mbuf if ipfr_fastroute6() fails.

Reviewed by Steve Woodford.
 1.10 20-May-2004  christos PR/25622: IPV6 return RST and through cloned interfaces was broken.
- checksum was computed incorrectly.
- ipv6 packet was not initialized properly.
- fixed code to be more similar to the v4 counterpart.
 1.9 18-May-2004  christos - remove superfluous assignment
- rt_gateway is already a pointer to struct sockaddr; don't take its address
when assigning it to struct sockaddr_in *
 1.8 09-May-2004  taca Make it comiple without warning; void function fr_checkv4sum() and
fr_checkv6sum() should not return value.
 1.7 09-May-2004  christos PR/24981: Steven M. Bellovin: ipfilter in 2.0 branch panics the system
patch applied.
 1.6 09-May-2004  christos PR/25332: HIROSE yuuji: "fastroute(to)" in ipf.conf doesn't work; patch applied
 1.5 22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.4 01-Apr-2004  martin Untangle ioctl copyin/copyout confusion. IP-Filter now actually works
on sparc64 (and probably everywhere else).
 1.3 28-Mar-2004  martti branches: 1.3.2;
Sync with official IPFilter
 1.2 28-Mar-2004  martti Upgraded IPFilter to 4.1.1
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.2 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.1 28-Mar-2004  martti Import IPFilter 4.1.1
 1.3.2.11 12-Nov-2004  jmc branches: 1.3.2.11.2;
Pullup patch (requested by darrenr in ticket #934)

build a new fr_info_t structure in fr_send_ip() and pass it through to
the fastroute function so that it uses accurate packet information about
the packet being sent out rather than the packet received (impacts both
return-rst and return-icmp features.) PR#27093
 1.3.2.10 11-Sep-2004  he Pull up revision 1.16 (requested by yamt in ticket #833):
Ensure that the whole packet is writable when IPF wants to
modify application data.
 1.3.2.9 11-Sep-2004  he Pull up revision 1.15 (requested by yamt in ticket #833):
Ensure that mbufs are writable, as IPF assumes so.
Fixes PR#26773 and PR#26850.
 1.3.2.8 13-Aug-2004  jmc Pullup rev 1.12-1.14 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.3.2.7 18-Jun-2004  grant Pull up revision 1.11 (requested by tron in ticket #501):

Don't leak mbuf if ipfr_fastroute6() fails.
 1.3.2.6 30-May-2004  tron Pull up revision 1.10 (requested by christos in ticket #416):
PR/25622: IPV6 return RST and through cloned interfaces was broken.
- checksum was computed incorrectly.
- ipv6 packet was not initialized properly.
- fixed code to be more similar to the v4 counterpart.
 1.3.2.5 30-May-2004  tron Pull up revision 1.9 (requested by christos in ticket #416):
- remove superfluous assignment
- rt_gateway is already a pointer to struct sockaddr; don't take its address
when assigning it to struct sockaddr_in *
 1.3.2.4 30-May-2004  tron Pull up revision 1.8 (requested by christos in ticket #416):
Make it comiple without warning; void function fr_checkv4sum() and
fr_checkv6sum() should not return value.
 1.3.2.3 30-May-2004  tron Pull up revision 1.7 (requested by christos in ticket #416):
PR/24981: Steven M. Bellovin: ipfilter in 2.0 branch panics the system
patch applied.
 1.3.2.2 30-May-2004  tron Pull up revision 1.6 (requested by christos in ticket #416):
PR/25332: HIROSE yuuji: "fastroute(to)" in ipf.conf doesn't work; patch applied
 1.3.2.1 01-Apr-2004  tron Pull up revision 1.4 (requested by martin in ticket #39):
Untangle ioctl copyin/copyout confusion. IP-Filter now actually works
on sparc64 (and probably everywhere else).
 1.3.2.11.2.1 06-Feb-2005  jmc Pull up revision 1.17 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.12.2.7 19-Oct-2004  skrll Sync with HEAD
 1.12.2.6 21-Sep-2004  skrll Fix the sync with head I botched.
 1.12.2.5 18-Sep-2004  skrll Sync with HEAD.
 1.12.2.4 12-Aug-2004  skrll Sync with HEAD.
 1.12.2.3 05-Aug-2004  skrll Fix merge mistakes.
 1.12.2.2 03-Aug-2004  skrll Sync with HEAD
 1.12.2.1 23-Jul-2004  skrll file ip_fil_netbsd.c was added on branch ktrace-lwp on 2004-08-03 10:54:38 +0000
 1.86 29-Jun-2024  riastradh netinet: Use _NET_STAT* API instead of direct array access.

PR kern/58380
 1.85 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.84 15-Feb-2021  knakahara Fix build failure for options GATEWAY.
 1.83 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.82 11-Apr-2018  maxv branches: 1.82.14;
Remove whitespaces/tabs, and one non-ASCII character.
 1.81 17-Nov-2017  ozaki-r branches: 1.81.2;
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.80 07-Feb-2017  ozaki-r branches: 1.80.6;
Add missing NULL checks for m_get_rcvif
 1.79 11-Jan-2017  ozaki-r branches: 1.79.2;
Get rid of unnecessary header inclusions
 1.78 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.77 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.76 01-Aug-2016  knakahara improve fast-forward performance when the number of flows exceeds IPFLOW_MAX.

In the fast-forward case, when the number of flows exceeds IPFLOW_MAX, the
performmance degraded to about 50% compared to the case less than IPFLOW_MAX
flows. This modification suppresses the degradation to 65%. Furthermore,
the modified kernel is about the same performance as the original kernel
when the number of flows is less than IPFLOW_MAX.

The original patch is implemented by ryo@n.o. Thanks.
 1.75 27-Jul-2016  knakahara remove extra ifdefs. no functional changes.

ip_flow.c becomes build target only if GATEWAY kernel option is on.
So, "#ifdef GATEWAY" in ip_flow.c is not needed.
 1.74 26-Jul-2016  ozaki-r Simplify by using atomic_swap instead of mutex

Suggested by kefren@
 1.73 11-Jul-2016  ozaki-r branches: 1.73.2;
Run timers in workqueue

Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).

Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.

Proposed on tech-net and tech-kern.
 1.72 20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.71 13-Jun-2016  knakahara eliminate unnecessary splnet
 1.70 13-Jun-2016  knakahara MP-ify fastforward to support GATEWAY kernel option.

I add "ipflow_lock" mutex in ip_flow.c and "ip6flow_lock" mutex in ip6_flow.c
to protect all data in each file. Of course, this is not MP-scalable. However,
it is sufficient as tentative workaround. We should make it scalable somehow
in the future.

ok by ozaki-r@n.o.
 1.69 13-Jun-2016  knakahara make ipflow_reap() static function.
 1.68 13-Jun-2016  knakahara remove unnecessary splnet before pool_{get,put}
 1.67 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.66 23-Mar-2015  roy Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.
 1.65 18-Oct-2014  snj branches: 1.65.2;
src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.64 22-May-2014  rmind branches: 1.64.2;
- Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.63 01-Apr-2014  pooka branches: 1.63.2;
Wrap ipflow_create() & ip6flow_create() in kernel lock. Prevents the
interrupt side on another core from seeing the situation while the ipflow
is being modified.
 1.62 19-Mar-2014  liamjfoy Move ipflow into ip_var.h and fix confliction
 1.61 19-Mar-2014  liamjfoy Remove ipflow_prune and replace with ipflow_reap. ok rmind@
 1.60 19-Jan-2012  liamjfoy branches: 1.60.6; 1.60.10;
Remove ipf_start from ipf struct
 1.59 01-Apr-2010  tls branches: 1.59.8; 1.59.12;
After discussion with ad@: it appears that KERNEL_LOCK also protects
the driver output path (that is, ifp->if_output()). In the case of
entry through the socket code, we are fine, because pru_usrreq takes
KERNEL_LOCK. However, there are a few other ways to cause output
which require protection:

1) direct calls to tcp_output() in tcp_input()
2) fast-forwarding code (ip_flow) -- protected elsewise
against itself by the softnet lock.
3) *Possibly* the ARP code. I have currently persuaded
myself that it is safe because of how it's called.
4) Possibly the ICMP code.

This change addresses #1 and #2.
 1.58 15-Mar-2009  cegger branches: 1.58.2; 1.58.4;
ansify function definitions
 1.57 01-Feb-2009  pooka branches: 1.57.2;
Init ipflow pool dynamically instead of using a linkset.
 1.56 28-Apr-2008  martin branches: 1.56.8;
Remove clause 3 and 4 from TNF licenses
 1.55 24-Apr-2008  ad branches: 1.55.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.54 12-Apr-2008  thorpej branches: 1.54.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.53 09-Apr-2008  thorpej - ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).
 1.52 07-Apr-2008  thorpej Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.
 1.51 04-Jan-2008  dyoung branches: 1.51.6;
Constify a bit.
 1.50 04-Jan-2008  dyoung Replace rtcache_down() with rtcache_validate() and update rtcache_down()
uses.
 1.49 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.48 20-Aug-2007  dyoung branches: 1.48.2; 1.48.8; 1.48.10; 1.48.14;
Don't call rtcache_check() from the fast-forward code, which runs
at IPL_NET, because rtcache_check() may read the forwarding table.
Elsewhere, the kernel only blocks interrupts at priority IPL_SOFTNET
and below while it modifies the forwarding table, so rtcache_check()
could be reading the table in an inconsistent state. Use
rtcache_done(), instead.

XXX netinet/ip_flow.c and netinet6/ip6_flow.c are virtually identical.
XXX They should share code.
 1.47 02-May-2007  dyoung branches: 1.47.2; 1.47.6;
Remove obsolete files netinet/in_route.[ch].
 1.46 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.45 05-Apr-2007  liamjfoy use size_t for indexes

just pass a *ip to ipflow_hash instead of members

ok christos@
 1.44 26-Mar-2007  liamjfoy Add a small note regarding further commented code in netinet6/ip6_flow.c
 1.43 25-Mar-2007  liamjfoy Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.
 1.42 12-Mar-2007  ad branches: 1.42.2; 1.42.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.41 04-Mar-2007  christos branches: 1.41.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.40 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.39 26-Jan-2007  dyoung branches: 1.39.2;
bzero -> memset
 1.38 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.37 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.36 06-Oct-2006  mrg add a missing semicolon from the previous commit.
 1.35 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.34 02-Sep-2006  liamjfoy branches: 1.34.2; 1.34.4;
increment ips_total too.

ok matt thomas
 1.33 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.32 24-Dec-2005  perry branches: 1.32.4; 1.32.6; 1.32.8; 1.32.14;
change comment from __const__ to const
 1.31 11-Dec-2005  christos merge ktrace-lwp.
 1.30 17-Oct-2005  christos small list macro cleanup:
- remove duplicate LIST_FIRST (Liam Foy)
- change to use LIST_FOREACH or for () instead of while () for consistency
 1.29 03-Feb-2005  perry branches: 1.29.6;
KNF + slightly ANSIfy
 1.28 25-Apr-2004  simonb branches: 1.28.4; 1.28.6;
Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.27 12-Dec-2003  scw Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.
 1.26 02-Nov-2002  perry branches: 1.26.6;
/*CONTCOND*/ while (0)'ed macros
 1.25 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.24 09-Jun-2002  itojun whitespace
 1.23 08-Mar-2002  thorpej branches: 1.23.6;
Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.22 13-Nov-2001  lukem add RCSIDs
 1.21 29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.20 17-Sep-2001  thorpej branches: 1.20.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.
 1.19 12-Jun-2001  wiz branches: 1.19.2; 1.19.4;
receive, not recieve
 1.18 02-Jun-2001  thorpej Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.17 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.16 30-Jun-2000  thorpej branches: 1.16.2;
Pass the correct destination address for the route-to-gateway case.
From Zdenek Salvet, kern/10483.
 1.15 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.14 17-Oct-1999  sommerfeld branches: 1.14.2; 1.14.10;
If a packet came in as link-level broadcast or link-level multicast, don't
attempt to fast-forward it out.
 1.13 26-Mar-1999  proff branches: 1.13.2; 1.13.8;
security: test for ip_len < ip_hl <<2 and drop packet accordingly
 1.12 28-Jan-1999  itohy ~htons(...) is always negative.
 1.11 25-Jan-1999  mycroft One more tweak to the checksum hack, and I promise I'm done. B-)
 1.10 25-Jan-1999  mycroft Absolutely minor tweak to generate better code.
 1.9 24-Jan-1999  mycroft Update the comment about the checksum hack. It was way out of date.
 1.8 24-Jan-1999  mycroft Modify the checksum slightly so that the htons()s can all be combined.
 1.7 08-Oct-1998  thorpej Use the pool allocator for ipflow entries.
 1.6 10-Jun-1998  sommerfe Truncate mbufs to the correct length before forwarding; fixes pr5560
 1.5 02-Jun-1998  thorpej In addition to the IP flow hash table, put the flows on a list. The table
is used for fast lookup, the list for traversal of all flows. Also, use
PRT timers.
 1.4 18-May-1998  matt Fix two bugs.
 1.3 04-May-1998  matt Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.
 1.2 04-May-1998  thorpej - kern/5380 (Dennis Ferguson): fix incremental IP header checksum.
- kern/5381 (Dennis Ferguson): check IP header checksum in fast forward
code.
- In ipflow_slowtimo(), if no IP flows are in use, don't bother checking
all of the hash buckets.
 1.1 29-Apr-1998  matt Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).
 1.13.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.2.2 01-Jul-2000  he Pull up revision 1.16 (requested by thorpej):
Pass the correct destination address for the route-to-gateway
case. Fixes PR#10483.
 1.13.2.1 18-Oct-1999  cgd pull up rev 1.14 from trunk (requested by sommerfeld):
Multicast storm prevention: don't attempt to forward link-level
multicast packets which contain ip unicast packets; these packets
would only be generated from misconfigured/buggy systems.
 1.14.10.1 30-Jun-2000  thorpej Pull up rev. 1.16:
Pass the correct destination address for the route-to-gateway case.
>From Zdenek Salvet, kern/10483.
 1.14.2.2 21-Apr-2001  bouyer Sync with HEAD
 1.14.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.16.2.7 11-Nov-2002  nathanw Catch up to -current
 1.16.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.16.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.16.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.16.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.16.2.2 21-Sep-2001  nathanw Catch up to -current.
 1.16.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.19.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.19.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.19.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.19.2.2 16-Mar-2002  jdolecek Catch up with -current.
 1.19.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.20.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.23.6.2 15-Jul-2002  gehenna catch up with -current.
 1.23.6.1 20-Jun-2002  gehenna catch up with -current.
 1.26.6.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.26.6.4 04-Feb-2005  skrll Sync with HEAD.
 1.26.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.26.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.26.6.1 03-Aug-2004  skrll Sync with HEAD
 1.28.6.1 12-Feb-2005  yamt sync with head.
 1.28.4.1 29-Apr-2005  kent sync with -current
 1.29.6.5 21-Jan-2008  yamt sync with head
 1.29.6.4 03-Sep-2007  yamt sync with head.
 1.29.6.3 26-Feb-2007  yamt sync with head.
 1.29.6.2 30-Dec-2006  yamt sync with head.
 1.29.6.1 21-Jun-2006  yamt sync with head.
 1.32.14.1 19-Jun-2006  chap Sync with head.
 1.32.8.2 03-Sep-2006  yamt sync with head.
 1.32.8.1 26-Jun-2006  yamt sync with head.
 1.32.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.32.4.1 09-Sep-2006  rpaulo sync with head
 1.34.4.3 18-Dec-2006  yamt sync with head.
 1.34.4.2 10-Dec-2006  yamt sync with head.
 1.34.4.1 22-Oct-2006  yamt sync with head
 1.34.2.3 01-Feb-2007  ad Sync with head.
 1.34.2.2 12-Jan-2007  ad Sync with head.
 1.34.2.1 18-Nov-2006  ad Sync with head.
 1.39.2.5 07-May-2007  yamt sync with head.
 1.39.2.4 15-Apr-2007  yamt sync with head.
 1.39.2.3 24-Mar-2007  yamt sync with head.
 1.39.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.39.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.41.2.4 09-Oct-2007  ad Sync with head.
 1.41.2.3 08-Jun-2007  ad Sync with head.
 1.41.2.2 10-Apr-2007  ad Sync with head.
 1.41.2.1 13-Mar-2007  ad Sync with head.
 1.42.4.1 29-Mar-2007  reinoud Pullup to -current
 1.42.2.1 11-Jul-2007  mjf Sync with head.
 1.47.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.47.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.48.14.2 08-Jan-2008  bouyer Sync with HEAD
 1.48.14.1 02-Jan-2008  bouyer Sync with HEAD
 1.48.10.1 26-Dec-2007  ad Sync with head.
 1.48.8.1 18-Feb-2008  mjf Sync with HEAD.
 1.48.2.1 09-Jan-2008  matt sync with HEAD
 1.51.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.54.2.1 18-May-2008  yamt sync with head.
 1.55.2.3 11-Aug-2010  yamt sync with head.
 1.55.2.2 04-May-2009  yamt sync with head.
 1.55.2.1 16-May-2008  yamt sync with head.
 1.56.8.2 28-Apr-2009  skrll Sync with HEAD.
 1.56.8.1 03-Mar-2009  skrll Sync with HEAD.
 1.57.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.58.4.1 30-May-2010  rmind sync with head
 1.58.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.59.12.1 18-Feb-2012  mrg merge to -current.
 1.59.8.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.59.8.1 17-Apr-2012  yamt sync with head
 1.60.10.2 18-May-2014  rmind sync with head
 1.60.10.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.60.6.2 03-Dec-2017  jdolecek update from HEAD
 1.60.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.63.2.1 10-Aug-2014  tls Rebase.
 1.64.2.1 12-May-2017  snj Pull up following revision(s) (requested by skrll/ozaki-r in ticket #1402):
sys/net/route.c: revision 1.170 via patch
sys/netinet/ip_flow.c: revision 1.73 via patch
sys/netinet6/ip6_flow.c: revision 1.28 via patch
sys/netinet6/nd6.c: revision 1.203 via patch
Run timers in workqueue
Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).
Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.
Proposed on tech-net and tech-kern.
 1.65.2.6 28-Aug-2017  skrll Sync with HEAD
 1.65.2.5 05-Feb-2017  skrll Sync with HEAD
 1.65.2.4 05-Dec-2016  skrll Sync with HEAD
 1.65.2.3 05-Oct-2016  skrll Sync with HEAD
 1.65.2.2 09-Jul-2016  skrll Sync with HEAD
 1.65.2.1 06-Apr-2015  skrll Sync with HEAD
 1.73.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.73.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.73.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.73.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.79.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.80.6.1 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.81.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.82.14.1 03-Apr-2021  thorpej Sync with HEAD.
 1.37 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.36 28-Mar-2004  martin branches: 1.36.4;
Cast 64 bit pointers only with (intptr_t) care.
 1.35 28-Mar-2004  martti Upgraded IPFilter to 4.1.1
 1.34 19-Sep-2002  martti branches: 1.34.6;
Resync with official IPF
 1.33 19-Sep-2002  martti Upgraded IPFilter to 3.4.29
 1.32 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.31 09-Jun-2002  itojun whitespace
 1.30 02-May-2002  martti branches: 1.30.2; 1.30.4;
Fix compilation problems
 1.29 02-May-2002  martti Upgraded IPFilter to 3.4.27
 1.28 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.27 24-Jan-2002  martti Re-sync with IPFilter
 1.26 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.25 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.24 13-Nov-2001  lukem add RCSIDs
 1.23 06-Apr-2001  darrenr branches: 1.23.2;
fix fragment cache security hole
 1.22 26-Mar-2001  mike Resolve conflicts.
 1.21 12-Jun-2000  veego branches: 1.21.2; 1.21.4;
Ups, forgot to resolve one place.
 1.20 12-Jun-2000  veego Resolve conflicts.
 1.19 11-May-2000  veego branches: 1.19.2;
Resolve conflicts and fix a compile error in ip_ftp_pxy.c.
 1.18 03-May-2000  veego Resolve conflicts.
 1.17 24-Mar-2000  thorpej Pull in <sys/callout.h> for the benefit of userland.
 1.16 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.15 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.14 02-Feb-1999  cjs branches: 1.14.2; 1.14.6; 1.14.8; 1.14.14;
Remove SCCS markers and make these compile in $NetBSD$ IDs.
 1.13 22-Nov-1998  mrg merge ipf 3.2.10
 1.12 12-Jul-1998  veego Resolve conflicts from the import.
 1.11 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.10 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.9 21-Sep-1997  veego branches: 1.9.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.8 06-Jul-1997  thorpej branches: 1.8.2;
Restore original RCS IDs.
 1.7 05-Jul-1997  darrenr fix conflicts from import
 1.6 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.5 27-May-1997  thorpej Make this compile on 32-bit architectures again:
- garbage-collect unused variables.
 1.4 25-May-1997  darrenr fix conflicts
 1.3 29-Mar-1997  thorpej Resolve conflicts from merge.

XXX !!! XXX !!!
I noticed a few semi-serious bugs while doing this merge, one of which
has existed for a fairly long time. Some of them are addressed in this
commit (because they caused the kernel to not compile), and are annoted
by "XXX" and "--thorpej". The other one will be addressed shortly in
a future commit, and, as far as I can tell, affects all operating systems
which IP Filter supports.
 1.2 05-Jan-1997  veego Add $NetBSD$ id's and restore the orginal Id's.
 1.1 05-Jan-1997  mrg branches: 1.1.1;
initial import of darren reed's ip-filter, version 3.1.2.
 1.1.1.21 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.20 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.19 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.18 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.17 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.16 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.15 12-Jun-2000  veego Import IP Filter 3.4.6
 1.1.1.14 11-May-2000  veego Import IP Filter 3.4.2
 1.1.1.13 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.12 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.11 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.10 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.9 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.8 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.7 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.6 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.5 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.4 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.3 25-May-1997  darrenr Import version 3.2alpha7
 1.1.1.2 27-Mar-1997  darrenr Bring in entire 3.2alpha2 source tree
 1.1.1.1 27-Mar-1997  darrenr Update to version 3.2alpha2
 1.8.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.9.2.4 24-Nov-1998  cgd pull up rev(s) 1.13 from trunk (ipfilter 3.2.10). (mrg)
 1.9.2.3 22-Jul-1998  mellon Puil up 1.12 (veego)
 1.9.2.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.9.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.14.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.14.8.3 21-Apr-2001  bouyer Sync with HEAD
 1.14.8.2 27-Mar-2001  bouyer Sync with HEAD.
 1.14.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.14.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.14.2.2 14-Apr-2001  he Pull up revision 1.23 (via patch, requested by darrenr):
Fix bug related to fragment cache handling.
 1.14.2.1 20-Dec-1999  he Pull up revision 1.15 (requested by darrenr):
Update IPF to version 3.3.5.
 1.19.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.21.4.9 20-Sep-2002  thorpej Sync with HEAD.
 1.21.4.8 27-Aug-2002  nathanw Catch up to -current.
 1.21.4.7 20-Jun-2002  nathanw Catch up to -current.
 1.21.4.6 04-May-2002  thorpej Update from trunk.
 1.21.4.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.21.4.4 28-Feb-2002  nathanw Catch up to -current.
 1.21.4.3 08-Jan-2002  nathanw Catch up to -current.
 1.21.4.2 14-Nov-2001  nathanw Catch up to -current.
 1.21.4.1 09-Apr-2001  nathanw Catch up with -current.
 1.21.2.3 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.21.2.2 09-Feb-2002  he Pull up revisions 1.22,1.24-1.27 (via patch, requested by martti):
Updated IPFilter to 3.4.23.
 1.21.2.1 14-Apr-2001  he Pull up revision 1.23 (requested by darrenr):
Fix bug related to fragment cache handling.
 1.23.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.23.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.23.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.23.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.23.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.30.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.30.2.2 29-Aug-2002  gehenna catch up with -current.
 1.30.2.1 20-Jun-2002  gehenna catch up with -current.
 1.34.6.4 19-Oct-2004  skrll Sync with HEAD
 1.34.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.34.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.34.6.1 03-Aug-2004  skrll Sync with HEAD
 1.36.4.1 06-Feb-2005  jmc Pull up revision 1.37 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.21 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.20 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.19 28-Mar-2004  martti branches: 1.19.2;
Upgraded IPFilter to 4.1.1
 1.18 19-Sep-2002  martti branches: 1.18.6;
Upgraded IPFilter to 3.4.29
 1.17 24-Jan-2002  martti branches: 1.17.10;
Upgraded IPFilter to 3.4.23
 1.16 06-Apr-2001  darrenr branches: 1.16.2;
fix fragment cache security hole
 1.15 26-Mar-2001  mike Resolve conflicts.
 1.14 03-May-2000  veego branches: 1.14.4; 1.14.6;
Resolve conflicts.
 1.13 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.12 22-Nov-1998  mrg branches: 1.12.4; 1.12.10; 1.12.16;
merge ipf 3.2.10
 1.11 29-May-1998  veego Resolve conflicts from the import of IPFilter 3.2.7.
 1.10 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.9 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.8 21-Sep-1997  veego branches: 1.8.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.7 06-Jul-1997  thorpej branches: 1.7.2;
Restore original RCS IDs.
 1.6 05-Jul-1997  darrenr fix conflicts from import
 1.5 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.4 25-May-1997  darrenr fix conflicts
 1.3 29-Mar-1997  thorpej Resolve conflicts from merge.

XXX !!! XXX !!!
I noticed a few semi-serious bugs while doing this merge, one of which
has existed for a fairly long time. Some of them are addressed in this
commit (because they caused the kernel to not compile), and are annoted
by "XXX" and "--thorpej". The other one will be addressed shortly in
a future commit, and, as far as I can tell, affects all operating systems
which IP Filter supports.
 1.2 05-Jan-1997  veego Add $NetBSD$ id's and restore the orginal Id's.
 1.1 05-Jan-1997  mrg branches: 1.1.1;
initial import of darren reed's ip-filter, version 3.1.2.
 1.1.1.19 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.18 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.17 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.16 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.15 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.14 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.13 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.12 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.11 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.10 29-May-1998  veego Import IP Filter 3.2.7
 1.1.1.9 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.8 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.7 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.6 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.5 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.4 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.3 25-May-1997  darrenr Import version 3.2alpha7
 1.1.1.2 27-Mar-1997  darrenr Bring in entire 3.2alpha2 source tree
 1.1.1.1 27-Mar-1997  darrenr Update to version 3.2alpha2
 1.7.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.8.2.2 24-Nov-1998  cgd pull up rev(s) 1.12 from trunk (ipfilter 3.2.10). (mrg)
 1.8.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.12.16.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.12.10.3 21-Apr-2001  bouyer Sync with HEAD
 1.12.10.2 27-Mar-2001  bouyer Sync with HEAD.
 1.12.10.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.2 14-Apr-2001  he Pull up revision 1.16 (requested by darrenr):
Fix bug related to fragment cache handling.
 1.12.4.1 20-Dec-1999  he Pull up revision 1.13 (requested by darrenr):
Update IPF to version 3.3.5.
 1.14.6.3 20-Sep-2002  thorpej Sync with HEAD.
 1.14.6.2 28-Feb-2002  nathanw Catch up to -current.
 1.14.6.1 09-Apr-2001  nathanw Catch up with -current.
 1.14.4.3 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.14.4.2 09-Feb-2002  he Pull up revisions 1.15-1.17 (requested by martti):
Updated IPFilter to 3.4.23
 1.14.4.1 14-Apr-2001  he Pull up revision 1.16 (requested by darrenr):
Fix bug related to fragment cache handling.
 1.16.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.16.2.1 11-Feb-2002  jdolecek Sync w/ -current.
 1.17.10.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.18.6.2 19-Oct-2004  skrll Sync with HEAD
 1.18.6.1 03-Aug-2004  skrll Sync with HEAD
 1.19.2.1 13-Aug-2004  jmc branches: 1.19.2.1.2;
Pullup rev 1.20 (requested by christos in ticket #1727)

Sync up w. ipf 4.1.3
 1.19.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.21 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.29 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.28 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.27 28-Mar-2004  martti branches: 1.27.2;
Upgraded IPFilter to 4.1.1
 1.26 19-Sep-2002  martti branches: 1.26.6;
Resync with official IPF
 1.25 19-Sep-2002  martti Upgraded IPFilter to 3.4.29
 1.24 02-May-2002  martti branches: 1.24.4;
Fix compilation problems
 1.23 02-May-2002  martti Upgraded IPFilter to 3.4.27
 1.22 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.21 24-Jan-2002  martti Re-sync with IPFilter
 1.20 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.19 13-Nov-2001  lukem add RCSIDs
 1.18 26-Mar-2001  mike branches: 1.18.2;
Resolve conflicts.
 1.17 09-Aug-2000  veego branches: 1.17.2;
Resolve conflicts.
 1.16 21-May-2000  veego branches: 1.16.4;
Resolve conflicts.
 1.15 11-May-2000  veego Resolve conflicts and fix a compile error in ip_ftp_pxy.c.
 1.14 03-May-2000  veego Resolve conflicts.
 1.13 30-Mar-2000  augustss Remove register declarations.
 1.12 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.11 22-Nov-1998  mrg branches: 1.11.4; 1.11.10; 1.11.16;
add two more prototypes. noted missing by mjacob.
 1.10 22-Nov-1998  mrg merge ipf 3.2.10
 1.9 12-Jul-1998  veego Resolve conflicts from the import.
 1.8 29-May-1998  veego Resolve conflicts from the import of IPFilter 3.2.7.
 1.7 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.6 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.5 21-Sep-1997  veego branches: 1.5.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.4 05-Jul-1997  darrenr branches: 1.4.2;
fix conflicts from import
 1.3 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.2 27-May-1997  thorpej Make this compile on 32-bit architectures:
- Add prototypes.
- garbage-collect unused variables.
 1.1 26-May-1997  darrenr branches: 1.1.1;
Initial revision
 1.1.1.22 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.21 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.20 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.19 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.18 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.17 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.16 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.15 09-Aug-2000  veego Import IP Filter 3.4.9
 1.1.1.14 21-May-2000  veego Import IP Filter 3.4.3
 1.1.1.13 11-May-2000  veego Import IP Filter 3.4.2
 1.1.1.12 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.11 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.10 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.9 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.8 29-May-1998  veego Import IP Filter 3.2.7
 1.1.1.7 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.6 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.5 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.4 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.3 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.2 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.1 26-May-1997  darrenr Import new sources for 3.2alpha7
(blah, someone want to clean away /cvsroot/sys/netinet ?)
 1.4.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.5.2.4 24-Nov-1998  cgd pull up rev(s) 1.10-1.11 from trunk (ipfilter 3.2.10). (mrg)
 1.5.2.3 22-Jul-1998  mellon Pull up 1.9 (veego)
 1.5.2.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.5.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.11.16.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.11.10.2 27-Mar-2001  bouyer Sync with HEAD.
 1.11.10.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.11.4.1 20-Dec-1999  he Pull up revision 1.12 (requested by darrenr):
Update IPF to version 3.3.5.
 1.16.4.4 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.16.4.3 09-Feb-2002  he Pull up revisions 1.18-1.21 (via patch, requested by martti):
Updated IPFilter to 3.4.23.
 1.16.4.2 05-Apr-2001  he Apply patch (requested by mike):
Fix machine hangs with FTP from behind a NetBSD box configured
to do NAT and FTP proxying. Fixes PR#12443.
 1.16.4.1 31-Aug-2000  veego Pull up ipf 3.4.9 (requested by veego). approved by releng-1-5.

basesrc/dist/ipf/HISTORY 1.8 -> 1.9
basesrc/dist/ipf/fils.c 1.9 -> 1.10
basesrc/dist/ipf/ip_sfil.c 1.5 -> 1.6
basesrc/dist/ipf/ipf.c 1.4 -> 1.5
basesrc/dist/ipf/ipmon.c 1.4 -> 1.5
basesrc/dist/ipf/ipnat.c 1.5 -> 1.6
basesrc/dist/ipf/natparse.c 1.3 -> 1.4
basesrc/dist/ipf/parse.c 1.4 -> 1.5
basesrc/dist/ipf/iplang/iplang_y.y 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.1 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.5 1.1 -> 1.2
syssrc/sys/netinet/fil.c 1.36 -> 1.37
syssrc/sys/netinet/ip_auth.c 1.17 -> 1.18
syssrc/sys/netinet/ip_fil.c 1.57 -> 1.58
syssrc/sys/netinet/ip_ftp_pxy.c 1.16 -> 1.17
syssrc/sys/netinet/ip_log.c 1.10 -> 1.11
syssrc/sys/netinet/ip_nat.c 1.34 -> 1.35
syssrc/sys/netinet/ip_nat.h 1.20 -> 1.21
syssrc/sys/netinet/ip_rcmd_pxy.c 1.4 -> 1.5
syssrc/sys/netinet/ip_state.c 1.26 -> 1.27
syssrc/sys/netinet/ip_state.h 1.16 -> 1.17
syssrc/sys/netinet/ipl.h 1.8 -> 1.9

Changes:
>3.4.9 08/08/2000 - Released
>
>implement new aging mechanism in fr_tcp_age()
>
>fix icmp state checking bug
>
>revamp buildsunos script and build both sparcv7/sparcv9 for Solaris
>if on an Ultra with a 64bit system & compiler (Caseper Dik)
>
>open ipfilter device read only if we know we can
>
>print out better information for ICMP packets in ipmon
>
>move checking for source spoofed packets to a point where we can generate
>logs of them
>
>return EFAULT from ircopyptr/iwcopyptr
>
>don't do ioctl(SIOCGETFS) for auth stats
>
>fix up freeing mbufs for post-4.3BSD
>
>fix returning of inc from ftp proxy
>
>fix bugs with ipfs -R/-W (Caseper Dik)
>
>3.4.8 19/07/2000 - Released
>
>create fake opt_inet6.h for FreeBSD-4 compile as LKM
>
>add #ifdef's for KLD_MODULE sanity
>
>NAT fastroute'd packets which come out of return-*
>
>fix upper/lower case crap in ftp proxy and get seq# checking fixed up.
>
>3.4.7 08/07/2000 - Released
>
>make "ipf -y" lookup NAT if's which are unknown
>
>prepend line numbers to ioctl error messages in ipf/ipnat
>
>don't apply patches to FreeBSD twice
>
>allow for ip_len to be on an unaligned boundary early on in fr_precheck
>
>fix printing of icmp code when it is 0
>
>correct printing of port numbers in map rules with from/to
>
>don't allow fr_func to be called at securelevel > 0 or rules to be added
>if securelevel > 0 if they have a non-zero fr_func.
 1.17.2.6 20-Sep-2002  thorpej Sync with HEAD.
 1.17.2.5 04-May-2002  thorpej Update from trunk.
 1.17.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.17.2.3 28-Feb-2002  nathanw Catch up to -current.
 1.17.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.17.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.18.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.18.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.18.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.18.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.24.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.26.6.3 19-Oct-2004  skrll Sync with HEAD
 1.26.6.2 05-Aug-2004  skrll Fix merge mistakes.
 1.26.6.1 03-Aug-2004  skrll Sync with HEAD
 1.27.2.1 13-Aug-2004  jmc branches: 1.27.2.1.2;
Pullup rev 1.28 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.27.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.29 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.48 05-Oct-2007  dyoung Work in progress: use a raw socket for GRE in IP encapsulation
instead of adding/subtracting our own IPv4 header.

There are many benefits: gre(4) needn't grok the outer encapsulation
header any longer, so this simplifies the gre(4) code. The IP
stack needn't grok GRE, so it is simplified, too. gre(4) will
benefit from optimizations in the socket code. Eventually, gre(4)
will gain an IPv6 encapsulation with very few new lines of code.

There is a small performance loss. A 133 MHz, 486-class AMD Elan
sinks/sources a TCP stream over GRE with about 93% the throughput
of the old code. TCP throughput on a 266 MHz, 586-class AMD Geode
is about 96% the throughput of the old code. A 175-MHz ADM5120
(MIPS) only sinks a TCP stream over GRE at about 90% of the old
code; I am still investigating that.

I produced stripped-down versions of sosend() and soreceive() for
gre(4) to use. They are guaranteed not to block, so they can be
called from a software interrupt and from a socket upcall,
respectively.

A kernel thread is no longer necessary for socket transmit/receive,
but I didn't get around to removing it, yet.

Thanks to Matt Thomas for suggesting the use of stripped-down socket
code and software interrupts, and to Andrew Doran for advice and
answers concerning software interrupts, threads, and performance.
 1.47 02-Sep-2007  dyoung branches: 1.47.2;
Be consistent: use the prefix sc_ for all members of the gre_softc.
 1.46 06-May-2007  dyoung branches: 1.46.2; 1.46.6; 1.46.8;
Oops, commit this straggler from the last change to net/if_gre.[ch].
 1.45 21-Mar-2007  dyoung If we do not recognize the protocol of a received packet, then
increase ifi_noproto. If the GRE header contains routing options,
increase the input-error count, ifi_ierrors.

While I am here, make some cosmetic changes: remove unnecessary
'proto' argument from gre_input3(). Shorten some staircases.
 1.44 16-Nov-2006  dyoung branches: 1.44.4; 1.44.8; 1.44.10; 1.44.12;
Use LIST_FOREACH().
 1.43 16-Nov-2006  dyoung Cosmetic: s/g_proto/sc_proto/. Remove superfluous parentheses and
curly braces.
 1.42 07-Sep-2006  dogcow branches: 1.42.2; 1.42.4;
remove more vestiges of CCITT, LLC, HDLC, NS, and NSIP.
 1.41 31-Aug-2006  dyoung Add a mode to gre(4) that sends GRE tunnel packets in UDP datagrams.
Fix MOBILE encapsulation. Add many debugging printfs (mainly
concerning UDP mode). Clean up the gre(4) code a bit. Add the
capability to setup UDP tunnels to ifconfig. Update documentation.

In UDP mode, gre(4) puts a GRE header onto transmitted packets,
and hands them to a UDP socket for transmission. That is, the
encapsulation looks like this: IP+UDP+GRE+encapsulated packet.

There are two ways to set up a UDP tunnel. One way is to tell the
source and destination IP+port to gre(4), and let gre(4) create
the socket. The other way to create a UDP tunnel is for userland
to "delegate" a UDP socket to the kernel.
 1.40 28-Jul-2006  dyoung Extract predicate M_UNWRITABLE(m, len), which is true iff len
consecutive bytes at the front of m are writable (i.e., neither
shared nor read-only).
 1.39 28-Jul-2006  dyoung Fix mtod() usage. If we will write to the mbuf data, check whether
the data is read-only/shared and call m_pullup(). Otherwise,
extract a const pointer to the mbuf data.

XXX I should extract a new macro, M_WRITABLE(m, len), that is true
if m has len consecutive writable bytes at its front.

KNF slightly.

Use bpf_mtap_af().
 1.38 28-Jul-2006  dyoung Use bpf_mtap_af(). KNF slightly.
 1.37 31-Jan-2006  elad branches: 1.37.2; 1.37.6;
fix tyop.

pr 32678 from yves emmanuel jutard.
 1.36 11-Dec-2005  christos branches: 1.36.2;
merge ktrace-lwp.
 1.35 26-Jul-2005  christos PR/30844: Gert Doering: Non-inet traffic is passed to bpf incorrectly (as inet)
 1.34 30-Mar-2005  is branches: 1.34.2;
Add IPv6 over GRE (contributed by Gert Doering in PR 29150).
 1.33 26-Feb-2005  perry branches: 1.33.2;
nuke trailing whitespace
 1.32 03-Feb-2005  perry some ANSIfying, and remove an unsightly tab
 1.31 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.30 26-Apr-2004  matt branches: 1.30.4; 1.30.6;
Remove #else clause of __STDC__
 1.29 05-Sep-2003  itojun branches: 1.29.4;
u_short -> u_int16_t
 1.28 26-Jun-2003  itojun branches: 1.28.2;
tabify
 1.27 02-May-2003  itojun clear m_flags just for safety
 1.26 21-Apr-2003  itojun remove redundant adjustment of m->m_pkthdr.len
 1.25 21-Apr-2003  itojun correct arg to m_pullup (need to count IP header size as well)
 1.24 21-Apr-2003  itojun correct (false) assumptions on mbuf chain. not sure if it really helps, but
anyways, it is necessary to perform m_pullup.
 1.23 25-Nov-2002  simonb The "osrc" variable in gre_mobile_input() is only ever set but not
referenced; remove it.
 1.22 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.21 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.20 10-Aug-2002  itojun inject GRE packet to raw ip socket input, to support userland GRE decapsulator.
discussed on openbsd developers list.
 1.19 09-Jun-2002  itojun style
 1.18 09-Jun-2002  itojun whitespace
 1.17 13-Nov-2001  lukem branches: 1.17.8;
add RCSIDs
 1.16 13-Apr-2001  thorpej branches: 1.16.2;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.15 12-Dec-2000  thorpej branches: 1.15.2;
Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.14 20-Oct-2000  mjl Mark packets from gre as coming from appropriate gre interface, not
transport interface.
 1.13 25-Aug-2000  mjl Add bpf tap to gre interface.
 1.12 06-Jul-2000  thorpej Some slight cleanup.
 1.11 05-Jul-2000  thorpej Fix an omission in the gre cloning changes.
 1.10 30-Mar-2000  augustss branches: 1.10.4;
Remove register declarations.
 1.9 25-Oct-1999  drochner defopt the XNS protocol (options NS), clean up the use of related
option headers / defines
 1.8 19-Jan-1999  mycroft branches: 1.8.8; 1.8.10; 1.8.12;
Don't screw with ip_len; just subtract from it where we actually use the
value.
 1.7 11-Jan-1999  thorpej Pull the IP-in-IP tunneling support out of the GRE code. It's not handled
by a separate IP-IP input path.

XXX Should eventually do the same thing for IPPROTO_MOBILE.
 1.6 22-Dec-1998  thorpej Simplify the tunnel lookup routine.
 1.5 13-Oct-1998  kim Use ETHERTYPE_ATALK instead of ETHERTYPE_AT. The former seems more common.
Our other constants also use "ATALK".

Added many new ETHERTYPE constants to sys/net/ethertypes.h, including the
ones from libpcap and tcpdump "ethertype.h" files.
 1.4 07-Oct-1998  thorpej Fix some typos in comments, and clean up some whitespace.
 1.3 02-Oct-1998  kleink Use #error instead of causing a parse error.
 1.2 30-Sep-1998  hwr Start supporting IPPROTO_MOBILE (55) encapsulation. This is yet
another tunneling protocol used by the Mobile-IP people. See RFC 2004
for this.
 1.1 13-Sep-1998  hwr Add a gre tunnel pseudo network device. Gre = generic route encapsulation.
This device shows up like any other network interface and can be used to
tunnel L3 protocols as e.g. IP over IP.
 1.8.12.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.8.10.1 15-Nov-1999  fvdl Sync with -current
 1.8.8.3 21-Apr-2001  bouyer Sync with HEAD
 1.8.8.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.8.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.10.4.2 20-Oct-2000  tv Pullup 1.14 [mjl]:
Mark packets from gre as coming from appropriate gre interface, not
transport interface.
 1.10.4.1 25-Aug-2000  mjl Add bpf tap to gre interfaces. Approved by thorpej.
 1.15.2.7 11-Dec-2002  thorpej Sync with HEAD.
 1.15.2.6 17-Sep-2002  nathanw Catch up to -current.
 1.15.2.5 27-Aug-2002  nathanw Catch up to -current.
 1.15.2.4 13-Aug-2002  nathanw Catch up to -current.
 1.15.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.15.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.15.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.16.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.16.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.16.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.16.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.17.8.2 29-Aug-2002  gehenna catch up with -current.
 1.17.8.1 20-Jun-2002  gehenna catch up with -current.
 1.28.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.28.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.28.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.28.2.4 04-Feb-2005  skrll Sync with HEAD.
 1.28.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.28.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.28.2.1 03-Aug-2004  skrll Sync with HEAD
 1.29.4.1 08-May-2005  snj Pull up revision 1.34 (requested by is in ticket #1382):
Add IPv6 over GRE (contributed by Gert Doering in PR 29150).
 1.30.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.30.6.1 12-Feb-2005  yamt sync with head.
 1.30.4.1 29-Apr-2005  kent sync with -current
 1.33.2.1 30-Mar-2005  tron Pull up revision 1.34 (requested by is in ticket #80):
Add IPv6 over GRE (contributed by Gert Doering in PR 29150).
 1.34.2.4 27-Oct-2007  yamt sync with head.
 1.34.2.3 03-Sep-2007  yamt sync with head.
 1.34.2.2 30-Dec-2006  yamt sync with head.
 1.34.2.1 21-Jun-2006  yamt sync with head.
 1.36.2.1 01-Feb-2006  yamt sync with head.
 1.37.6.3 14-Sep-2006  yamt sync with head.
 1.37.6.2 03-Sep-2006  yamt sync with head.
 1.37.6.1 11-Aug-2006  yamt sync with head
 1.37.2.1 09-Sep-2006  rpaulo sync with head
 1.42.4.1 10-Dec-2006  yamt sync with head.
 1.42.2.1 18-Nov-2006  ad Sync with head.
 1.44.12.1 29-Mar-2007  reinoud Pullup to -current
 1.44.10.1 11-Jul-2007  mjf Sync with head.
 1.44.8.4 19-Oct-2007  ad Sync with head.
 1.44.8.3 12-Oct-2007  ad Fix merge errors.
 1.44.8.2 08-Jun-2007  ad Sync with head.
 1.44.8.1 10-Apr-2007  ad Sync with head.
 1.44.4.2 07-May-2007  yamt sync with head.
 1.44.4.1 24-Mar-2007  yamt sync with head.
 1.46.8.1 06-Nov-2007  matt sync with HEAD
 1.46.6.2 07-Oct-2007  joerg Sync with HEAD.
 1.46.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.46.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.47.2.1 06-Oct-2007  yamt sync with head.
 1.9 05-Oct-2007  dyoung Work in progress: use a raw socket for GRE in IP encapsulation
instead of adding/subtracting our own IPv4 header.

There are many benefits: gre(4) needn't grok the outer encapsulation
header any longer, so this simplifies the gre(4) code. The IP
stack needn't grok GRE, so it is simplified, too. gre(4) will
benefit from optimizations in the socket code. Eventually, gre(4)
will gain an IPv6 encapsulation with very few new lines of code.

There is a small performance loss. A 133 MHz, 486-class AMD Elan
sinks/sources a TCP stream over GRE with about 93% the throughput
of the old code. TCP throughput on a 266 MHz, 586-class AMD Geode
is about 96% the throughput of the old code. A 175-MHz ADM5120
(MIPS) only sinks a TCP stream over GRE at about 90% of the old
code; I am still investigating that.

I produced stripped-down versions of sosend() and soreceive() for
gre(4) to use. They are guaranteed not to block, so they can be
called from a software interrupt and from a socket upcall,
respectively.

A kernel thread is no longer necessary for socket transmit/receive,
but I didn't get around to removing it, yet.

Thanks to Matt Thomas for suggesting the use of stripped-down socket
code and software interrupts, and to Andrew Doran for advice and
answers concerning software interrupts, threads, and performance.
 1.8 10-Dec-2005  elad branches: 1.8.30; 1.8.44; 1.8.46; 1.8.48;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.7 21-Apr-2004  itojun branches: 1.7.12;
no space between function name and paren: foo (blah) -> foo(blah)
 1.6 18-Apr-2004  matt De __P()
 1.5 09-Jun-2002  itojun branches: 1.5.6;
whitespace
 1.4 06-Jul-2000  thorpej branches: 1.4.2; 1.4.4; 1.4.16;
Some slight cleanup.
 1.3 07-Oct-1998  thorpej branches: 1.3.12;
Fix some typos in comments, and clean up some whitespace.
 1.2 30-Sep-1998  hwr Start supporting IPPROTO_MOBILE (55) encapsulation. This is yet
another tunneling protocol used by the Mobile-IP people. See RFC 2004
for this.
 1.1 13-Sep-1998  hwr Add a gre tunnel pseudo network device. Gre = generic route encapsulation.
This device shows up like any other network interface and can be used to
tunnel L3 protocols as e.g. IP over IP.
 1.3.12.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.16.1 20-Jun-2002  gehenna catch up with -current.
 1.4.4.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.4.2.1 20-Jun-2002  nathanw Catch up to -current.
 1.5.6.4 11-Dec-2005  christos Sync with head.
 1.5.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.5.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.5.6.1 03-Aug-2004  skrll Sync with HEAD
 1.7.12.2 27-Oct-2007  yamt sync with head.
 1.7.12.1 21-Jun-2006  yamt sync with head.
 1.8.48.1 06-Oct-2007  yamt sync with head.
 1.8.46.1 06-Nov-2007  matt sync with HEAD
 1.8.44.1 07-Oct-2007  joerg Sync with HEAD.
 1.8.30.1 09-Oct-2007  ad Sync with head.
 1.11 02-Oct-2004  christos These are ipfilter files, although they don't have the same copyright.
Thanks jaromir.
 1.10 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.9 28-Mar-2004  martti branches: 1.9.2;
Upgraded IPFilter to 4.1.1
 1.8 23-Jun-2003  martin branches: 1.8.2;
#ifdef _KERNEL_OPT police
 1.7 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.6 19-Sep-2002  martti branches: 1.6.2;
Resync with official IPF
 1.5 19-Sep-2002  martti Upgraded IPFilter to 3.4.29
 1.4 09-Jun-2002  itojun branches: 1.4.2;
whitespace
 1.3 02-May-2002  martti branches: 1.3.2; 1.3.4;
Upgraded IPFilter to 3.4.27
 1.2 01-Apr-2002  jdolecek branches: 1.2.2;
add RCS IDs
 1.1 01-Apr-2002  jdolecek branches: 1.1.1;
Initial revision
 1.1.1.5 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.4 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.3 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.2 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.1 01-Apr-2002  jdolecek Import H.323 proxy of IPFilter 3.4.25. Upon closer examination,
the QNX licence seems to be allow both non-commercial and commercial
use actually.

According to Darren, the H.323 proxy code is buggy ATM, but is imported
here for reference anyway.
 1.2.2.5 20-Sep-2002  thorpej Sync with HEAD.
 1.2.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.2.2.3 04-May-2002  thorpej Update from trunk.
 1.2.2.2 17-Apr-2002  nathanw Catch up to -current.
 1.2.2.1 01-Apr-2002  nathanw file ip_h323_pxy.c was added on branch nathanw_sa on 2002-04-17 00:06:25 +0000
 1.3.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.3.2.1 20-Jun-2002  gehenna catch up with -current.
 1.4.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.4.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.4.2.1 09-Jun-2002  jdolecek file ip_h323_pxy.c was added on branch kqueue on 2002-06-23 17:50:50 +0000
 1.6.2.3 27-Nov-2002  itojun sys/netinet/ip_h323_pxy.c via patch
sys/netinet/ip_ipsec_pxy.c via patch
sys/netinet/ip_netbios_pxy.c via patch

Fix compilation on a.out systems.

(Thorsten Frueauf)
 1.6.2.2 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.6.2.1 19-Sep-2002  itojun file ip_h323_pxy.c was added on branch netbsd-1-5 on 2002-10-18 13:16:46 +0000
 1.8.2.2 19-Oct-2004  skrll Sync with HEAD
 1.8.2.1 03-Aug-2004  skrll Sync with HEAD
 1.9.2.1 13-Aug-2004  jmc branches: 1.9.2.1.2;
Pullup rev 1.10 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.9.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.11 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.2 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.4; 1.1.1.1.6;
Import IPFilter 4.1.1
 1.1.1.1.6.1 06-Feb-2005  jmc Pull up revision 1.2 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.1.1.1.4.5 19-Oct-2004  skrll Sync with HEAD
 1.1.1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.1.1.4.1 28-Mar-2004  skrll file ip_htable.c was added on branch ktrace-lwp on 2004-08-03 10:54:39 +0000
 1.2 02-Oct-2004  jdolecek move ip_htable.h from sys/netinet/ to sys/dist/ipf/netinet/, it's ipfilter file
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.4; 1.1.1.1.6;
Import IPFilter 4.1.1
 1.1.1.1.6.1 06-Feb-2005  jmc Pull up revision 1.2 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.1.1.1.4.5 19-Oct-2004  skrll Sync with HEAD
 1.1.1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.1.1.4.1 28-Mar-2004  skrll file ip_htable.h was added on branch ktrace-lwp on 2004-08-03 10:54:39 +0000
 1.180 05-Jun-2025  ozaki-r Apply if_first_addr() and if_first_addr_psref()
 1.179 22-Feb-2025  mlelstv Use canonical M_GETHDR macro. NFCI.
 1.178 29-Aug-2022  knakahara branches: 1.178.4; 1.178.10;
Add sysctl entry to control to send routing message for RTM_DYNAMIC.

Some routing daemons require such routing message to keep coherency.

If we want to let kernel send such message, set net.inet.icmp.dynamic_rt_msg=1
for IPv4, net.inet6.icmp6.dynamic_rt_msg=1 for IPv6.
Default(=0) is the same as before, that is, not send such routing message.
 1.177 22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.176 15-Nov-2018  maxv Remove the 't' argument from m_tag_find().
 1.175 15-Nov-2018  maxv Simplify the mtag API:

- Remove m_tag_init(), m_tag_first(), m_tag_next() and
m_tag_delete_nonpersistent().

- Remove the 't' argument from m_tag_delete_chain().
 1.174 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.173 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.172 21-Jun-2018  knakahara branches: 1.172.2;
sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.
However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.171 01-Jun-2018  ozaki-r Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release the reference
of a passed rtentry by themselves (but they didn't).
 1.170 11-May-2018  maxv Retire ICMPPRINTFS, it's annoying and it doesn't build.
 1.169 26-Apr-2018  maxv Use M_UNWRITABLE, no functional change.
 1.168 08-Feb-2018  maxv branches: 1.168.2;
Fix a possible buffer overflow in the IPv4 _ctlinput functions.

In _icmp_input we are guaranteeing that the ICMP_ADVLENMIN-byte area
starting from 'icp' is contiguous.

ICMP_ADVLENMIN = 8 + sizeof(struct ip) + 8 = 36

But the _ctlinput functions (eg udp_ctlinput) expect the area to be
larger. These functions read at:

(uint8_t *)icp + 8 + (icp->icmp_ip.ip_hl << 2)

which can be crafted to be:

(uint8_t *)icp + 68

So we end up reading 'icp+68' while the valid area ended at 'icp+36'.

Having said that, it seems pretty complicated to trigger this bug; it
would have to be a fragmented packet with half of the ICMP header in the
first fragment, and we would need to have a driver that did not allocate
a cluster for the first mbuf of the chain.

The check of icmplen against ICMP_ADVLEN(icp) was not sufficient: while it
did guarantee that the ICMP header fit the chain, it did not guarantee
that it fit 'm'.

Fix this bug by pulling up to hlen+ICMP_ADVLEN(icp). No need to log an
error. Rebase the pointers afterwards.
 1.167 05-Feb-2018  maxv Declare icmperrppslim in ip_icmp.c, it shouldn't be used elsewhere.
 1.166 23-Jan-2018  maxv Don't use global variables, that's obviously incorrect on MP systems.
One remains, because it is imported in tcp_timer.c, and I'm not totally
sure of how it interacts with icmp_mtudisc().
 1.165 23-Jan-2018  maxv Style, localify icmp_send, and add a clear KASSERT (that replaces a vague
comment).
 1.164 22-Jan-2018  maxv Adapt previous, reintroduce MH_ALIGN. It's used as an optimization - we
can later prepend something to the current mbuf without having to allocate
a new mbuf.
 1.163 19-Jan-2018  maxv Fix a buffer overflow in icmp_error. We create in 'm' a packet that must
contain:

IPv4 header | Fixed part of ICMP header | Variable part of ICMP header

But we perform length checks on 'totlen', which does not count the IPv4
header.

So now, add sizeof(struct ip) in totlen, and stop doing this m_data
nonsense, just get the pointers as usual.
 1.162 19-Jan-2018  maxv Clarify icmp_error:

* Rename (and constify) oiplen -> oiphlen.

* Rename icmplen -> datalen, it's the size of the variable part of
the ICMP header, not the total size of the ICMP header itself.

* Introduce totlen, this is the total size of the ICMP header (icmp_ip
included).

No real functional change.
 1.161 31-Mar-2017  ozaki-r branches: 1.161.6;
Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)
 1.160 06-Mar-2017  ozaki-r Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029
 1.159 17-Feb-2017  ozaki-r Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock
 1.158 13-Feb-2017  ozaki-r Protect mtudisc and redirect stuffs of icmp/icmp6 with mutex

We have to run pr_init of icmp and icmp6 prior to tcp and tcp6 ones
for mutex initialization.
 1.157 07-Feb-2017  ozaki-r Add missing NULL checks for m_get_rcvif
 1.156 02-Feb-2017  ozaki-r Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net
 1.155 24-Jan-2017  ozaki-r Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.154 12-Dec-2016  ozaki-r branches: 1.154.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.153 25-Oct-2016  ozaki-r Remove unnecessary argument

No functional change.
 1.152 19-Oct-2016  ozaki-r Set ia to ensure to call ia4_release
 1.151 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.150 08-Jul-2016  ozaki-r branches: 1.150.2;
Replace macros to get an IP address with proper inline functions

The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
 1.149 07-Jul-2016  ozaki-r Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.148 06-Jul-2016  ozaki-r Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.
 1.147 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.146 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.145 01-Apr-2016  ozaki-r Remove unnecessary casts and do s/0/NULL/ for rtrequest
 1.144 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.143 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.142 31-Aug-2015  ozaki-r Make rt_refcnt take into account rt_timer
 1.141 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.140 09-May-2015  christos if no address was found, don't check if it is tentative (hi Roy)
 1.139 09-May-2015  christos assign sin only when it is needed
 1.138 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.137 24-Apr-2015  ozaki-r Use KASSERT instead of if & panic

rt can be NULL only when programming error (and we sure it cannot for now),
so we can use KASSERT here (i.e., check only if DIAGNOSTIC).
 1.136 24-Apr-2015  ozaki-r Replace 0 with NULL for pointer variables
 1.135 02-Dec-2014  christos use the new printing code.
 1.134 30-May-2014  christos branches: 1.134.4;
Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.133 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.132 25-Feb-2014  pooka branches: 1.132.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.131 05-Jun-2013  christos branches: 1.131.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.130 25-Mar-2013  christos PR/47693: Erik E. Fair: Add missing code to icmp handling.
- While there, add the rest of the missing codes
- Merge groups
- Fix indentation
 1.129 22-Mar-2012  drochner branches: 1.129.2;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.128 09-Jan-2012  liamjfoy branches: 1.128.2;
check against NULL
 1.127 31-Dec-2011  christos - fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.126 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.125 17-Jul-2011  joerg branches: 1.125.2; 1.125.6;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.124 02-Jul-2010  kefren manually adjust m_data and m_len so it can later be prepended with a
struct ip in case that a cluster is used. icmp len panic is not valid for
cluster case.

Fixes PR/43548
 1.123 26-Jun-2010  kefren Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.122 07-Dec-2009  christos branches: 1.122.2; 1.122.4;
PR/42243: Yasuoka Masahiko: Add "net.inet.icmp.bmcastecho" sysctl support,
to disable icmp replies to the broadcast address.
 1.121 16-Sep-2009  pooka Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.120 18-Jun-2008  yamt branches: 1.120.6;
merge yamt-pf42 branch.
(import newer pf from OpenBSD 4.2)

ok'ed by peter@. requested by core@
 1.119 04-May-2008  thorpej branches: 1.119.2; 1.119.4;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.118 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.117 23-Apr-2008  thorpej branches: 1.117.2;
Use <net/net_stats.h> / netstat_sysctl().
 1.116 12-Apr-2008  thorpej branches: 1.116.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.115 06-Apr-2008  thorpej Change ICMP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.
 1.114 09-Nov-2007  dyoung branches: 1.114.14;
Use sockaddr_in_init(). KNF. No functional change intended.
 1.113 27-Aug-2007  dyoung branches: 1.113.2; 1.113.6; 1.113.8;
Cosmetic: 0 -> NULL. Remove unnecessary cast.
 1.112 19-Jul-2007  dyoung branches: 1.112.4; 1.112.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.111 04-Mar-2007  christos branches: 1.111.2; 1.111.10;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.110 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.109 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.108 29-Jan-2007  dyoung branches: 1.108.2;
bzero -> memset
 1.107 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.106 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.105 16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.104 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.103 30-Aug-2006  christos branches: 1.103.2; 1.103.4;
fix initializers
 1.102 28-Aug-2006  yamt icmp_input: don't assume relations between PRC_ and ICMP_ values.
 1.101 10-Jul-2006  peter Wrap long lines, unwrap a short line.
 1.100 10-Jul-2006  peter Moves the PF_GENERATED m_tag to the new packet in icmp_error.
This is needed because the pf code can call icmp_error with setting
this tag, but the new packet should not be filtered when it comes back
to pf(4).

ok christos@
 1.99 29-Mar-2006  dyoung branches: 1.99.4;
When reflecting an ICMP Echo, do not scribble over read-only/shared
mbuf storage.
 1.98 22-Mar-2006  matt An MTU can't be negative so store them in unsigned variables.
 1.97 10-Nov-2005  christos branches: 1.97.6; 1.97.8; 1.97.10; 1.97.12; 1.97.14;
Remove redundant assignment (from Liam Foy)
 1.96 23-Oct-2005  christos No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.
 1.95 19-Aug-2005  christos branches: 1.95.2;
make ICMPPRINTFS work; from Liam Foy.
 1.94 05-Aug-2005  elad Add sysctls for IP, ICMP, TCP, and UDP statistics.
 1.93 19-Jul-2005  christos Implement PMTU checks from:

http://www.gont.com.ar/drafts/icmp-attacks-against-tcp.html

1. Don't act on ICMP-need-frag immediately if adhoc checks on the
advertised MTU fail. The MTU update is delayed until a TCP retransmit
happens.
2. Ignore ICMP Source Quench messages meant for TCP connections.

From OpenBSD.
 1.92 29-Apr-2005  yamt branches: 1.92.2;
move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.
 1.91 26-Feb-2005  perry nuke trailing whitespace
 1.90 03-Feb-2005  perry ANSIfy function declarations
 1.89 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.88 24-Jan-2005  matt branches: 1.88.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.87 03-Aug-2004  cube branches: 1.87.4;
Remove a common (icmpstat).
 1.86 25-Jun-2004  itojun icmp_reflect: check if m_pkthdr.rcvif is non-NULL before touching it.
icmp_reflect could be called from the output path, so m_pkthdr.rcvif may not
be set. (found by panic when PF is configured "block return all")
 1.85 25-Jun-2004  itojun be careful touching m_pkthdr.rcvif, it could be NULL if the packet was
generated from local node and icmp_error calls icmp_reflect.
 1.84 25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.83 26-Apr-2004  matt Remove #else clause of __STDC__
 1.82 24-Mar-2004  atatat branches: 1.82.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.81 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.80 13-Nov-2003  jonathan Add m_tag_delete_nonpesrsistent(), for deleting all packet tags on
mbuf chains which are recycled (e.g., ICMP reflection, loopback
interface). A consensus was reached that such recycled packets should
behave (more-or-less) the same way if a new chain had been allocated
and the contents copied to that chain.

Some packet tags may in future be marked as "persistent" (e.g., for
mandatory access controls) and should persist across such deletion.
NetBSD as yet hos no persistent tags, so m_tag_delete_nonpersistent()
just deletes all tags. This should not be relied upon.
 1.79 11-Nov-2003  jonathan Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.
 1.78 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.77 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.76 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.75 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.74 26-Jun-2003  itojun branches: 1.74.2;
fix comment
 1.73 17-Apr-2003  tron Clear hardware checksum flags before reusing a mbuf for an ICMP reply as
suggested by Enami Tsugutomo. This fixes PR kern/21203 by myself.
 1.72 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.71 23-Sep-2002  simonb Remove breaks after returns, unreachable returns and returns after
returns(!).
 1.70 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.69 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.68 13-Jun-2002  itojun set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)
 1.67 09-Jun-2002  itojun whitespace
 1.66 13-Nov-2001  lukem branches: 1.66.8; 1.66.10;
add RCSIDs
 1.65 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.64 04-Nov-2001  matt Keep only one mtu_table (the two were identical except for
one value - 65280).
 1.63 30-Oct-2001  kml Add in support for timing out IPv4 routes added due to redirects,
as discussed in tech-net several weeks ago. It turned out that
KAME had already added this functionality to the IPv6 stack, so
I followed their example in adding the sysctl variables
net.inet.icmp.rediraccept and net.inet.icmp.redirtimeout.
 1.62 29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.61 20-Oct-2001  matt branches: 1.61.2;
Make the two MTU tables const and change their type to u_int (one was int
and one was u_long!).
 1.60 08-Mar-2001  itojun branches: 1.60.2;
Remove a bogus rtfree(); OpenBSD PR 1706.
 1.59 01-Mar-2001  itojun branches: 1.59.2;
make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code
 1.58 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.57 18-Oct-2000  itojun s/mtudisc_callback/icmp_&/ so that we don't feel conflict between IPv4 and
IPv6 counterpart. (or icmp4_&?)
 1.56 18-Oct-2000  itojun count successful path MTU changes. good for debugging.
(there could be some discussion on when to increase the counter...)
 1.55 18-Oct-2000  thorpej Restructure the Path MTU Discovery code somewhat to avoid
entering rtentry's for hosts we're not actually communicating
with.

Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
* TCP -- Lookup the connection based on the address/port
pairs in the ICMP message.
* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.

If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes. For TCP, this is where we now
refresh cached routes and re-enter slow-start.

As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.

XXX Note, this is only a fix for the IPv4 case. For the IPv6
XXX case, we need to wait for the KAME folks.

Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
 1.54 28-Jul-2000  itojun nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit
 1.53 27-Jul-2000  itojun do not disable icmp error rate limitation for local address.
local address can be abused too. pps rate limitation should work fine for
moderate amount of icmp errors.
 1.52 24-Jul-2000  sommerfeld Improve robustness of icmp_error():
- allow it to work when icmpreturndatabytes is sufficiently large that the
icmp error message doesn't fit in a header mbuf.
- defend against mbuf chains shorter than their contained ip->ip_len.
 1.51 10-Jul-2000  itojun implement net.inet.icmp.errppslimit.
make default value for net.inet.icmp.erratelimit to 0, as < 10ms value
does not do the right thing.
 1.50 06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.49 01-Jul-2000  sommerfeld Don't rate-limit ICMP errors from packets we send to ourselves.
The dns resolver depends on reliably receiving errors to allow it to
quickly detect a dead local nameserver.
 1.48 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.47 10-Jun-2000  darrenr branches: 1.47.2;
add icmpreturndatabytes kernel variable (default 8) which specifies the
number of extra data bytes to return in ICMP error messages. This is
also available via sysctl as net.icmp.returndatabytes and is limited to
[8,512].
 1.46 22-May-2000  itojun branches: 1.46.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).
 1.45 10-May-2000  itojun add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.
 1.44 30-Mar-2000  augustss Remove register declarations.
 1.43 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.42 24-Feb-2000  itojun don't transmit ICMPv4 packet back, if the original packet was encyrpted.
 1.41 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.40 15-Feb-2000  thorpej Add ICMP error rate limiting, based on the same for ICMP6.

Note, we're reusing the previously unused slot for "MTU discovery" (which
was moved to the "net.inet.ip" branch of the sysctl tree quite some time
ago).
 1.39 25-Jan-2000  sommerfeld Pick source address for ICMP errors a bit more intelligently when
there are multiple addresses on the interface.

From Marc Horowitz <marc@netbsd.org>, who left this sitting for too long.
 1.38 09-Jul-1999  thorpej branches: 1.38.2;
defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.37 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.36 30-Mar-1999  mycroft branches: 1.36.4; 1.36.6;
Fix a null pointer dereference in the case where forwarding is turned on and
there are interfaces up but with no addresses.
 1.35 19-Jan-1999  mycroft There's just no plausible reason to byte-swap ip_id internally. It's opaque.
 1.34 19-Jan-1999  mycroft Don't screw with ip_len; just subtract from it where we actually use the
value.
 1.33 19-Jan-1999  mycroft Fix byte-swapping of ip_len in returned IP header.
 1.32 11-Jan-1999  thorpej Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.
 1.31 19-Dec-1998  thorpej Reverse the copyright-notice-swap. It went against existing practice.
 1.30 30-Sep-1998  tls branches: 1.30.4;
Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.
 1.29 29-Apr-1998  kml Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.
 1.28 15-Feb-1998  tls Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.
 1.27 13-Feb-1998  tls Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.
 1.26 29-Oct-1997  kml Changes to path MTU discovery to correctly handle "needs
fragmentation" ICMP messages that specify a new MTU size of zero
(from, say, old buggy Linux kernels).
 1.25 18-Oct-1997  kml branches: 1.25.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc
 1.24 17-Oct-1997  kml Path MTU Discovery support. This is turned off by default.
Use sysctl -w net.inet.icmp.mtudisc=1 to turn on.
Still to come: path removal after some period, black hole detection
 1.23 24-Jun-1997  thorpej Increment icmpstat.icps_badlen for bad length of ICMP_MASKREQ, per
Stevens in TCP/IP Illustrated vol. 2, p.319. Submitted by
Koji Imada <koji@math.human.nagoya-u.ac.jp> in PR #3712.
 1.22 13-Oct-1996  christos backout previous kprintf changes
 1.21 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.20 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.19 13-Feb-1996  christos netinet prototypes
 1.18 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.17 04-Jun-1995  mycroft Don't cast things unnecessarily.
 1.16 04-Jun-1995  mycroft Clean up many more casts.
 1.15 01-Jun-1995  mycroft Don't use INADDR_* constants in case labels.
 1.14 01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.13 31-May-1995  mycroft Integrate multicast 3.5 distribution, with several bugs fixed and general
cleanup. This is a (working) snapshot of work in progress.
 1.12 15-May-1995  cgd KNF
 1.11 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.10 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.7 10-Jan-1994  mycroft Should compile now with or without `options MULTICAST'.
 1.6 08-Jan-1994  mycroft More prototypes.
 1.5 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.4 18-Dec-1993  mycroft Canonicalize all #includes.
 1.3 06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.2 20-May-1993  cgd more rcsid additions and file header cleanups
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.25.2.3 01-Oct-1998  cgd pull up revisions 1.27-1.28, 1.30 from trunk. (tls)
 1.25.2.2 09-May-1998  mycroft Pull up patch from kml.
 1.25.2.1 30-Oct-1997  mellon Pull rev 1.26 up from trunk
 1.30.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.36.6.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.36.6.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.36.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.36.4.2 02-Aug-1999  thorpej Update from trunk.
 1.36.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.38.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.38.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.38.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.46.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.47.2.5 06-Apr-2001  he Pull up revision 1.58 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.47.2.4 11-Mar-2001  he Pull up revision 1.59 (requested by itojun):
Ensure that we enforce inbound IPsec policy on all IP protocols,
not just TCP, UDP and ICMP.
 1.47.2.3 16-Aug-2000  itojun pullup (approved by releng-1-5)

switch from net.inet*.*.*ratelimit to net.inet*.*.ppslimit.

(tags are rough estimate - we had some try-and-error in main trunc)
sys/netinet/icmp6.h 1.9 -> 1.11
sys/netinet/icmp_var.h 1.15 -> 1.17
sys/netinet/in_proto.c 1.39 -> 1.42
sys/netinet/ip_icmp.c 1.50 -> 1.51, 1.52 -> 1.54
sys/netinet/tcp_input.c 1.111 -> 1.112, 1.115 -> 1.117
sys/netinet/tcp_usrreq.c 1.52 -> 1.53
sys/netinet/tcp_var.h 1.72 -> 1.75
sys/netinet6/icmp6.c 1.34 -> 1.35, 1.36 -> 1.38
sys/netinet6/in6_proto.c 1.17 -> 1.19
 1.47.2.2 28-Jul-2000  sommerfeld Pull up UDP, ICMP fixes:

- Drop packet, increment udps_badlen if the udp header length field
reports a size smaller than the udp header; defends against bogus
packets seen by by Assar Westerlund.

- allow icmp_error() to work when icmpreturndatabytes is sufficiently
large that the icmp error message doesn't fit in a header mbuf.

- defend against mbuf chains shorter than their contained ip->ip_len.

Joint work of myself, itojun, and assar
Approved by thorpej

revisions pulled up:
sys/netinet/ip_icmp.c 1.52
sys/netinet/udp_usrreq.c 1.70
 1.47.2.1 02-Jul-2000  sommerfeld Pull up 1.49: don't rate-limit ICMP we send to ourselves.
 1.59.2.7 18-Oct-2002  nathanw Catch up to -current.
 1.59.2.6 27-Aug-2002  nathanw Catch up to -current.
 1.59.2.5 01-Aug-2002  nathanw Catch up to -current.
 1.59.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.59.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.59.2.2 22-Oct-2001  nathanw Catch up to -current.
 1.59.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.60.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.60.2.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.60.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.60.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.61.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.66.10.2 16-Jun-2003  grant Pull up revision 1.73 (requested by tron in ticket #1260):

Clear hardware checksum flags before reusing a mbuf for an ICMP reply as
suggested by Enami Tsugutomo. This fixes PR kern/21203 by myself.
 1.66.10.1 15-Jun-2002  lukem Pull up revision 1.68 (requested by itojun in ticket #266):
set IPv4 parameter to modern value.
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)
 1.66.8.3 29-Aug-2002  gehenna catch up with -current.
 1.66.8.2 15-Jul-2002  gehenna catch up with -current.
 1.66.8.1 20-Jun-2002  gehenna catch up with -current.
 1.74.2.8 11-Dec-2005  christos Sync with head.
 1.74.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.74.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.74.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.74.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.74.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.74.2.2 12-Aug-2004  skrll Sync with HEAD.
 1.74.2.1 03-Aug-2004  skrll Sync with HEAD
 1.82.2.2 03-Aug-2004  jmc Pullup rev 1.85-1.87 (requested by christos in ticket #732)

icmp_reflect: check if m_pkthdr.rcvif is non-NULL before touching it.
icmp_reflect could be called from the output path, so m_pkthdr.rcvif may not
be set. (found by panic when PF is configured "block return all")
 1.82.2.1 28-May-2004  tron Pull up revision 1.84 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.87.4.1 29-Apr-2005  kent sync with -current
 1.88.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.88.2.1 12-Feb-2005  yamt sync with head.
 1.92.2.5 15-Nov-2007  yamt sync with head.
 1.92.2.4 03-Sep-2007  yamt sync with head.
 1.92.2.3 26-Feb-2007  yamt sync with head.
 1.92.2.2 30-Dec-2006  yamt sync with head.
 1.92.2.1 21-Jun-2006  yamt sync with head.
 1.95.2.1 26-Oct-2005  yamt sync with head
 1.97.14.2 31-Mar-2006  tron Merge 2006-03-31 NetBSD-current into the "peter-altq" branch.
 1.97.14.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.97.12.1 19-Apr-2006  elad sync with head.
 1.97.10.3 03-Sep-2006  yamt sync with head.
 1.97.10.2 11-Aug-2006  yamt sync with head
 1.97.10.1 01-Apr-2006  yamt sync with head.
 1.97.8.1 22-Apr-2006  simonb Sync with head.
 1.97.6.1 09-Sep-2006  rpaulo sync with head
 1.99.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.103.4.3 18-Dec-2006  yamt sync with head.
 1.103.4.2 10-Dec-2006  yamt sync with head.
 1.103.4.1 22-Oct-2006  yamt sync with head
 1.103.2.3 01-Feb-2007  ad Sync with head.
 1.103.2.2 12-Jan-2007  ad Sync with head.
 1.103.2.1 18-Nov-2006  ad Sync with head.
 1.108.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.108.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.111.10.2 03-Sep-2007  skrll Sync with HEAD.
 1.111.10.1 15-Aug-2007  skrll Sync with HEAD.
 1.111.2.2 09-Oct-2007  ad Sync with head.
 1.111.2.1 20-Aug-2007  ad Sync with HEAD.
 1.112.6.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.112.6.1 19-Jul-2007  dyoung file ip_icmp.c was added on branch matt-mips64 on 2007-07-19 20:48:55 +0000
 1.112.4.2 11-Nov-2007  joerg Sync with HEAD.
 1.112.4.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.113.8.1 19-Nov-2007  mjf Sync with HEAD.
 1.113.6.1 13-Nov-2007  bouyer Sync with HEAD
 1.113.2.1 09-Jan-2008  matt sync with HEAD
 1.114.14.2 29-Jun-2008  mjf Sync with HEAD.
 1.114.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.116.2.2 18-May-2008  yamt sync with head.
 1.116.2.1 19-Apr-2008  yamt Peter Postma's work-in-progress pf import from OpenBSD 4.2.
updated to -current by me.
 1.117.2.4 11-Aug-2010  yamt sync with head.
 1.117.2.3 11-Mar-2010  yamt sync with head
 1.117.2.2 04-May-2009  yamt sync with head.
 1.117.2.1 16-May-2008  yamt sync with head.
 1.119.4.1 18-Jun-2008  simonb Sync with head.
 1.119.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.120.6.1 09-Jun-2013  msaitoh Pull up following revision(s) (requested by fair in ticket #1855):
sys/netinet/ip_icmp.c: revision 1.130
PR/47693: Erik E. Fair: Add missing code to icmp handling.
- While there, add the rest of the missing codes
- Merge groups
- Fix indentation
 1.122.4.1 03-Jul-2010  rmind sync with head
 1.122.2.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.125.6.2 05-Apr-2012  mrg sync to latest -current.
 1.125.6.1 18-Feb-2012  mrg merge to -current.
 1.125.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.125.2.1 17-Apr-2012  yamt sync with head
 1.128.2.1 31-Mar-2013  riz Pull up following revision(s) (requested by fair in ticket #860):
sys/netinet/ip_icmp.c: revision 1.130
PR/47693: Erik E. Fair: Add missing code to icmp handling.
- While there, add the rest of the missing codes
- Merge groups
- Fix indentation
 1.129.2.3 03-Dec-2017  jdolecek update from HEAD
 1.129.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.129.2.1 23-Jun-2013  tls resync from head
 1.131.2.2 18-May-2014  rmind sync with head
 1.131.2.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.132.2.1 10-Aug-2014  tls Rebase.
 1.134.4.9 28-Aug-2017  skrll Sync with HEAD
 1.134.4.8 05-Feb-2017  skrll Sync with HEAD
 1.134.4.7 05-Dec-2016  skrll Sync with HEAD
 1.134.4.6 05-Oct-2016  skrll Sync with HEAD
 1.134.4.5 09-Jul-2016  skrll Sync with HEAD
 1.134.4.4 22-Apr-2016  skrll Sync with HEAD
 1.134.4.3 22-Sep-2015  skrll Sync with HEAD
 1.134.4.2 06-Jun-2015  skrll Sync with HEAD
 1.134.4.1 06-Apr-2015  skrll Sync with HEAD
 1.150.2.5 26-Apr-2017  pgoyette Sync with HEAD
 1.150.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.150.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.150.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.150.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.154.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.161.6.3 13-Jul-2018  martin Pull up following revision(s) via patch (requested by knakahara in ticket #905):

sys/netinet/ip_mroute.c: revision 1.160
sys/netinet6/in6_l2tp.c: revision 1.16
sys/net/if.h: revision 1.263
sys/netinet/in_l2tp.c: revision 1.15
sys/netinet/ip_icmp.c: revision 1.172
sys/netinet/igmp.c: revision 1.68
sys/netinet/ip_encap.c: revision 1.69
sys/netinet6/ip6_mroute.c: revision 1.129

sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.

However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.161.6.2 08-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #852):

sys/netinet6/icmp6.c: revision 1.238
sys/netinet/ip_icmp.c: revision 1.171
sys/net/route.c: revision 1.210

Fix _rt_free via rtrequest(RTM_DELETE) hangs in rt_timer handlers

A rt_timer handler is passed a rtentry with an extra reference that avoids the
rtentry is accidentally released. So rt_timer handers must release
the reference of a passed rtentry by themselves (but they didn't).
 1.161.6.1 31-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #675):

sys/netinet/ip_icmp.c: revision 1.168

Fix a possible buffer overflow in the IPv4 _ctlinput functions.

In _icmp_input we are guaranteeing that the ICMP_ADVLENMIN-byte area
starting from 'icp' is contiguous.

ICMP_ADVLENMIN = 8 + sizeof(struct ip) + 8 = 36

But the _ctlinput functions (eg udp_ctlinput) expect the area to be
larger. These functions read at:

(uint8_t *)icp + 8 + (icp->icmp_ip.ip_hl << 2)

which can be crafted to be:

(uint8_t *)icp + 68

So we end up reading 'icp+68' while the valid area ended at 'icp+36'.

Having said that, it seems pretty complicated to trigger this bug; it
would have to be a fragmented packet with half of the ICMP header in the
first fragment, and we would need to have a driver that did not allocate
a cluster for the first mbuf of the chain.

The check of icmplen against ICMP_ADVLEN(icp) was not sufficient: while it
did guarantee that the ICMP header fit the chain, it did not guarantee
that it fit 'm'.

Fix this bug by pulling up to hlen+ICMP_ADVLEN(icp). No need to log an
error. Rebase the pointers afterwards.
 1.168.2.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.168.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.168.2.5 30-Sep-2018  pgoyette Ssync with HEAD
 1.168.2.4 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.168.2.3 25-Jun-2018  pgoyette Sync with HEAD
 1.168.2.2 21-May-2018  pgoyette Sync with HEAD
 1.168.2.1 02-May-2018  pgoyette Synch with HEAD
 1.172.2.1 10-Jun-2019  christos Sync with HEAD
 1.178.10.1 02-Aug-2025  perseant Sync with HEAD
 1.178.4.1 01-Oct-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1164):

sys/net/link_proto.c: revision 1.41
sys/netinet6/in6.c: revision 1.293
sys/net/if.h: revision 1.307
sys/netinet/ip_icmp.c: revision 1.180
sys/dev/vmt/vmt_subr.c: revision 1.11
sys/netinet6/in6_var.h: revision 1.105
sys/netinet6/in6_var.h: revision 1.106
sys/net/if.c: revision 1.532
sys/net/if.c: revision 1.533
sys/netinet6/mld6.c: revision 1.102
sys/netinet/in_var.h: revision 1.104
sys/net/if_spppsubr.c: revision 1.270
sys/net/if_spppsubr.c: revision 1.271
sys/netinet6/nd6.c: revision 1.284

if: introduce if_first_addr() (and psref variant)

It returns a first address (ifa) of a given family on a given interface.

It will replace a bunch of open codes and make their intent clear.
Apply if_first_addr() and if_first_addr_psref()

in6: introduce in6ifa_first_lladdr() (and psref variant)

It returns a first link-local address (ifa) on a given interface.
Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.44 24-May-2022  andvar fix various typos in comment, documentation and log messages.
 1.43 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.42 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.41 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.40 14-Sep-2018  maxv branches: 1.40.12;
Use non-variadic function pointer in protosw::pr_input.
 1.39 08-Feb-2018  maxv branches: 1.39.2; 1.39.4;
Use C99 types - in particular, stop using n_time and n_short -, style, and
remove prototype of icmp_sysctl (does not exist). No functional change.
 1.38 23-Jan-2018  maxv Style, localify icmp_send, and add a clear KASSERT (that replaces a vague
comment).
 1.37 19-Jan-2018  maxv Move the ICMP Extension structures from mpls_ttl.c to ip_icmp.h; that's
part of the ICMP protocol (per RFC4884), and not specific to MPLS. Also
add ih_exthdr in struct icmp, the 'length' field appeared.

While here, style in MPLS.
 1.36 19-Jan-2018  maxv Style, explain a bit, and fix icmp_radv, it should be icmp_dun.id_radv.
 1.35 17-Feb-2017  ozaki-r Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock
 1.34 18-Feb-2015  christos branches: 1.34.2; 1.34.4;
PR/49676: Ryo Shimizu: ICMP_STATINC() buffer overflows
XXX: pullup-7
 1.33 24-Dec-2011  christos branches: 1.33.2; 1.33.6; 1.33.8; 1.33.16; 1.33.22; 1.33.24;
missing comma
 1.32 24-Dec-2011  christos add ICMP_STRINGS, a few more missing constants.
 1.31 24-Dec-2011  christos fix whitespace only
 1.30 24-Dec-2011  christos add SKIP, fix PHOTURIS codes
 1.29 23-Dec-2011  jmc Include the ICMP_PHOTURIS error codes if we're going to define ICMP_PHOTURIS
 1.28 23-Dec-2011  christos make ICMP_MAXTYPE 18 again to unbreak stats.
 1.27 23-Dec-2011  christos add missing icmp types.
 1.26 26-Jun-2010  kefren branches: 1.26.8; 1.26.12;
Add MPLS support, proposed on tech-net@ a couple of days ago

Welcome to 5.99.33
 1.25 08-Sep-2008  gmcgarry branches: 1.25.4; 1.25.14; 1.25.16;
Replace most gcc-specific __attribute__ uses with BSD-style sys/cdef.h
preprocessor macros.
 1.24 25-Dec-2007  perry branches: 1.24.6; 1.24.10; 1.24.12; 1.24.16;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.23 14-May-2006  christos branches: 1.23.34; 1.23.40; 1.23.44; 1.23.48;
Comment out attribute packed. Gcc4 warns us that the field is too narrow
for packing. Produces the same size struct on i386 (28 bytes)
 1.22 10-Dec-2005  elad branches: 1.22.4; 1.22.6; 1.22.8; 1.22.12;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.21 23-Oct-2005  christos No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.
 1.20 21-Apr-2004  itojun branches: 1.20.12; 1.20.14;
no space between function name and paren: foo (blah) -> foo(blah)
 1.19 18-Apr-2004  matt De __P()
 1.18 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.17 09-Jun-2002  itojun branches: 1.17.6;
whitespace
 1.16 30-Oct-2001  kml branches: 1.16.8;
Add in support for timing out IPv4 routes added due to redirects,
as discussed in tech-net several weeks ago. It turned out that
KAME had already added this functionality to the IPv6 stack, so
I followed their example in adding the sysctl variables
net.inet.icmp.rediraccept and net.inet.icmp.redirtimeout.
 1.15 18-Oct-2000  thorpej branches: 1.15.2; 1.15.4; 1.15.8;
Restructure the Path MTU Discovery code somewhat to avoid
entering rtentry's for hosts we're not actually communicating
with.

Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
* TCP -- Lookup the connection based on the address/port
pairs in the ICMP message.
* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.

If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes. For TCP, this is where we now
refresh cached routes and re-enter slow-start.

As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.

XXX Note, this is only a fix for the IPv4 case. For the IPv6
XXX case, we need to wait for the KAME folks.

Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
 1.14 20-Nov-1999  thorpej Add the `packed' attribute to structures which describe wire protocol data.
 1.13 10-Feb-1998  perry branches: 1.13.12; 1.13.14; 1.13.20;
add/cleanup multiple inclusion protection.
 1.12 26-Aug-1997  thorpej Add ICMP unreachable code #13 - "Communication Administratively Prohibited",
per RFC 1716. From Havard Eidnes <he@vader.runit.sintef.no>, PR #4038.
 1.11 03-Aug-1996  neil branches: 1.11.10;
Prototypes and definitions for ICMP Router Discovery, From FreeBSD.

rdisc coming soon! :-)
 1.10 13-Feb-1996  christos netinet prototypes
 1.9 17-Apr-1995  cgd spacing cleaup. also, minor type mixup fixups.
 1.8 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.7 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 08-Jan-1994  mycroft More prototypes.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.11.10.1 28-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.13.20.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.12.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.13.12.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.13.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.15.8.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.15.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.15.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.15.2.2 20-Jun-2002  nathanw Catch up to -current.
 1.15.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.16.8.1 20-Jun-2002  gehenna catch up with -current.
 1.17.6.5 11-Dec-2005  christos Sync with head.
 1.17.6.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.17.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.17.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.17.6.1 03-Aug-2004  skrll Sync with HEAD
 1.20.14.1 26-Oct-2005  yamt sync with head
 1.20.12.2 21-Jan-2008  yamt sync with head
 1.20.12.1 21-Jun-2006  yamt sync with head.
 1.22.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.22.8.1 24-May-2006  yamt sync with head.
 1.22.6.1 01-Jun-2006  kardel Sync with head.
 1.22.4.1 09-Sep-2006  rpaulo sync with head
 1.23.48.1 02-Jan-2008  bouyer Sync with HEAD
 1.23.44.1 26-Dec-2007  ad Sync with head.
 1.23.40.1 18-Feb-2008  mjf Sync with HEAD.
 1.23.34.1 09-Jan-2008  matt sync with HEAD
 1.24.16.1 19-Oct-2008  haad Sync with HEAD.
 1.24.12.1 24-Sep-2008  wrstuden Merge in changes between wrstuden-revivesa-base-2 and
wrstuden-revivesa-base-3.
 1.24.10.2 11-Aug-2010  yamt sync with head.
 1.24.10.1 04-May-2009  yamt sync with head.
 1.24.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.25.16.1 03-Jul-2010  rmind sync with head
 1.25.14.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.25.4.1 09-Jun-2013  msaitoh Apply patch (in ticket #1855):
Add some macros. This is a part of ip_icmp.h rev. 1.32.
 1.26.12.1 18-Feb-2012  mrg merge to -current.
 1.26.8.1 17-Apr-2012  yamt sync with head
 1.33.24.2 28-Aug-2017  skrll Sync with HEAD
 1.33.24.1 06-Apr-2015  skrll Sync with HEAD
 1.33.22.1 21-Feb-2015  martin Pull up following revision(s) (requested by christos in ticket #537):
sys/netinet/icmp_var.h: revision 1.30
sys/netinet/ip_icmp.h: revision 1.34
PR/49676: Ryo Shimizu: ICMP_STATINC() buffer overflows
XXX: pullup-7
 1.33.16.1 21-Feb-2015  martin Pull up following revision(s) (requested by christos in ticket #1258):
sys/netinet/icmp_var.h: revision 1.30
sys/netinet/ip_icmp.h: revision 1.34
PR/49676: Ryo Shimizu: ICMP_STATINC() buffer overflows
 1.33.8.1 21-Feb-2015  martin Pull up following revision(s) (requested by christos in ticket #1258):
sys/netinet/icmp_var.h: revision 1.30
sys/netinet/ip_icmp.h: revision 1.34
PR/49676: Ryo Shimizu: ICMP_STATINC() buffer overflows
 1.33.6.1 03-Dec-2017  jdolecek update from HEAD
 1.33.2.1 21-Feb-2015  martin Pull up following revision(s) (requested by christos in ticket #1258):
sys/netinet/icmp_var.h: revision 1.30
sys/netinet/ip_icmp.h: revision 1.34
PR/49676: Ryo Shimizu: ICMP_STATINC() buffer overflows
 1.34.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.34.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.39.4.1 10-Jun-2019  christos Sync with HEAD
 1.39.2.1 30-Sep-2018  pgoyette Ssync with HEAD
 1.40.12.1 03-Apr-2021  thorpej Sync with HEAD.
 1.17 08-Mar-2021  christos remove now unused pseudo-random ip id code.
 1.16 18-Oct-2019  msaitoh branches: 1.16.8;
s/initalize/initialize/ in comment or printf message.
 1.15 19-Nov-2011  tls branches: 1.15.50;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.14 05-Nov-2010  rmind branches: 1.14.8;
ip_randomid: make mechanism MP-safe and more modular.

OK matt@
 1.13 04-Nov-2010  matt Replace the copyright with a new TNF copyright since nothing of the old
ip_id.c remains. Remove old comments which have no relevance anymore.
 1.12 06-Feb-2008  matt branches: 1.12.30; 1.12.32;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.
 1.11 30-Aug-2006  christos branches: 1.11.28; 1.11.34;
static comes first
 1.10 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.9 11-Dec-2005  christos branches: 1.9.4; 1.9.6; 1.9.8; 1.9.14;
merge ktrace-lwp.
 1.8 23-Mar-2004  itojun branches: 1.8.4; 1.8.18;
typo
 1.7 02-Jan-2004  itojun no need for tmp = arc4randomid here
 1.6 26-Dec-2003  wiz Niels Provos kindly agreed to drop clauses 3 and 4 from the
license -- thanks.
Based on OpenBSD commit and hints by itojun.
 1.5 10-Dec-2003  itojun comment from niels provos;
- seed2 is necessary, but use it as "seed2 + x" not "seed2 ^ x".
- skipping number is not needed, so disable it for 16bit generator (makes
the repetition period to 30000)
 1.4 25-Nov-2003  itojun "seed2" was ruining non-repeating property, so remove it. discussed on tech-net
 1.3 19-Nov-2003  jonathan Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.
 1.2 16-Sep-2003  itojun exp is reserved name under posix
 1.1 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.8.18.3 11-Feb-2008  yamt sync with head.
 1.8.18.2 30-Dec-2006  yamt sync with head.
 1.8.18.1 21-Jun-2006  yamt sync with head.
 1.8.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.8.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.8.4.2 03-Aug-2004  skrll Sync with HEAD
 1.8.4.1 23-Mar-2004  skrll file ip_id.c was added on branch ktrace-lwp on 2004-08-03 10:54:39 +0000
 1.9.14.1 19-Jun-2006  chap Sync with head.
 1.9.8.2 03-Sep-2006  yamt sync with head.
 1.9.8.1 26-Jun-2006  yamt sync with head.
 1.9.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.9.4.1 09-Sep-2006  rpaulo sync with head
 1.11.34.1 18-Feb-2008  mjf Sync with HEAD.
 1.11.28.1 23-Mar-2008  matt sync with HEAD
 1.12.32.1 05-Mar-2011  rmind sync with head
 1.12.30.1 06-Nov-2010  uebayasi Sync with HEAD.
 1.14.8.1 17-Apr-2012  yamt sync with head
 1.15.50.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.16.8.1 03-Apr-2021  thorpej Sync with HEAD.
 1.406 17-Jul-2025  ozaki-r in: avoid racy ia4_acquire(ifatoia(rt->rt_ifa) in ip_rtaddr()

Same as the case of ip_output(), it's racy and should be avoided.

PR kern/59527
 1.405 17-Jun-2025  ozaki-r in: avoid packet looping on incoming packets destining to an initializing address

The initialization of an IPv4 address is done by adding a connected route and
a local route (if necessary), and then publishing itself by adding it to the
global list (and the global hashtable). Thus, there can exist a route with an
address that is not published. This inconsistent state allows an incoming
packet destining to one of a host address which is not published but has a
local route to be forwarded and routed to a loopback interface. This results
in forwarding the packet back to ip_input, that is, packet looping.

To avoid the situation, prohibit packets being forwarded via a local route.

This is a workaround for "IPv4 address initialization atomicity" in doc/TODO.smpnet.
 1.404 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.403 29-Jun-2024  riastradh branches: 1.403.2;
netinet: Use _NET_STAT* API instead of direct array access.

PR kern/58380
 1.402 02-Sep-2022  thorpej branches: 1.402.4;
pktqueue: Re-factor sysctl handling.

Provide a new pktq_sysctl_setup() function that attaches standard
pktq sysctl nodes below a specified parent node, with either a
fixed node ID or CTL_CREATE to dynamically assign node IDs. Make
all of the sysctl handlers private to pktqueue.c, and remove the
INET- and INET6-specific pktqueue sysctl code from net/if.c.
 1.401 08-Mar-2021  christos remove now unused pseudo-random ip id code.
 1.400 07-Mar-2021  christos netinet: Enable random IP fragment ids by default (from riastradh)
 1.399 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.398 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.397 28-Aug-2020  ozaki-r branches: 1.397.2;
inet: reduce silent packet discards
 1.396 28-Aug-2020  ozaki-r inet: pull m_get_rcvif_psref out of ip_input for simplicity

Same as ip6_input.
 1.395 28-Aug-2020  ozaki-r ipsec: rename ipsec_ip_input to ipsec_ip_input_checkpolicy

Because it just checks if a packet passes security policies.
 1.394 28-Aug-2020  ozaki-r inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.
 1.393 13-Nov-2019  ozaki-r Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.
 1.392 19-Sep-2019  ozaki-r Apply some missing changes lost on the previous commit
 1.391 19-Sep-2019  ozaki-r Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@
 1.390 15-Sep-2019  bouyer Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.389 13-May-2019  ozaki-r branches: 1.389.2;
Count packets dropped by pfil
 1.388 17-Jan-2019  knakahara Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.
 1.387 15-Nov-2018  maxv Remove the 't' argument from m_tag_find().
 1.386 02-Sep-2018  maxv remove reference to ipnat, and duplicate comments
 1.385 10-Jul-2018  maxv Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.
 1.384 17-May-2018  maxv branches: 1.384.2;
Add KASSERTs, related to PR/39794.
 1.383 14-May-2018  maxv Merge ipsec4_input and ipsec6_input into ipsec_ip_input. Make the argument
a bool for clarity. Optimize the function: if M_CANFASTFWD is not there
(because already removed by the firewall) leave now.

Makes it easier to see that M_CANFASTFWD is not removed on IPv6.
 1.382 10-May-2018  maxv Rename ipsec4_forward -> ipsec_mtu, and switch to void.
 1.381 26-Apr-2018  maxv Remove unused mbuf argument from sbsavetimestamp.
 1.380 15-Apr-2018  maxv Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.
 1.379 11-Apr-2018  maxv Don't pass IP_ALLOWBROADCAST in ipsec4_input. The flag lands in
ipsec_getpolicybyaddr, and only IP_FORWARDING is taken.

In fact it would be good to change the 'flags' argument of ipsec4_input
to be a boolean, same for ipsec_getpolicybyaddr. It would be less
misleading.
 1.378 11-Apr-2018  maxv Add comment about IPsec.
 1.377 11-Apr-2018  maxv Small changes in ip_dooptions: replace bcopy by memcpy, the areas can't
overlap.
 1.376 24-Feb-2018  ozaki-r branches: 1.376.2;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043
 1.375 09-Feb-2018  maxv Remove dead code.
 1.374 07-Feb-2018  maxv Remove null check on ip, it can't be null. (Confuses code scanners.)
 1.373 06-Feb-2018  maxv Typos and style a bit, no functional change.
 1.372 05-Feb-2018  maxv Exterminate IPSENDREDIRECTS and IPMTUDISCTIMEOUT, neither is documented.
 1.371 05-Feb-2018  maxv Nuke DIRECTED_BROADCAST, it is not documented and not enabled anywhere. It
probably wouldn't have built correctly anyway, since there is no associated
defflag.

These ten lines of code in ip_input.c already look a lot better.
 1.370 05-Feb-2018  maxv Clean up this mess. This is typically the kind of places where we need to
seriously cut the bullshit. These things are unreadable, undocumented, and
all they bought us was not figuring out we had IPv4 forwarding enabled by
default for 20+ years.
 1.369 05-Feb-2018  maxv Be tougher, and don't allow LSRR+SSRR (RFC7126).
 1.368 05-Feb-2018  maxv Kick duplicate options, they are not allowed (RFC791).
 1.367 05-Feb-2018  maxv Remove unused variable.
 1.366 05-Feb-2018  maxv Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:

source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network

And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.
 1.365 05-Feb-2018  maxv Style, no functional change.
 1.364 01-Jan-2018  christos 1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo
 1.363 24-Nov-2017  roy Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.
 1.362 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.361 27-Sep-2017  ozaki-r Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
 1.360 27-Jul-2017  ozaki-r Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
 1.359 19-Jul-2017  ozaki-r Correct a comment
 1.358 08-Jul-2017  christos Reorder the controls to the ones that need an interface and the ones that
don't; process the ones that don't first. Add a DIAGNOSTIC if there is no
interface; really this should be a KASSERT/panic because it is a bug if the
interface is not set at this point.
 1.357 06-Jul-2017  christos remove unnecessary casts (no functional change)
 1.356 06-Jul-2017  christos Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.
 1.355 01-Jun-2017  chs branches: 1.355.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.354 31-Mar-2017  ozaki-r Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)
 1.353 31-Mar-2017  ozaki-r Don't use a single global variable as a temporal storage for multiple packets

It's not MP-safe. So use local variables instead.
 1.352 06-Mar-2017  ozaki-r Make sure icmp_redirect_timeout_q and ip_mtudisc_timeout_q are initialized on bootup

Fix PR kern/52029
 1.351 17-Feb-2017  ozaki-r Fix return value
 1.350 17-Feb-2017  ozaki-r Protect sysctl_net_inet_ip_pmtudto with icmp_mtx instead of softnet_lock
 1.349 07-Feb-2017  ozaki-r Add missing NULL checks for m_get_rcvif
 1.348 24-Jan-2017  ozaki-r Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.347 12-Dec-2016  ozaki-r branches: 1.347.2;
Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.346 08-Dec-2016  ozaki-r Use psref for ip_rtaddr

ip_rtaddr will be sleepable soon. So use psref instead of pserialize.
 1.345 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.344 18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.343 18-Oct-2016  ozaki-r Avoid double frees of mbuf

May fix one of panicks reported by Tom Ivar Helbekkmo in PR kern/51522
 1.342 11-Oct-2016  ozaki-r Fix kernel builds with IFA_STATS
 1.341 07-Sep-2016  roy Disallow input to detached addresses because they are not yet valid.
 1.340 31-Aug-2016  ozaki-r Make ipforward_rt and ip6_forward_rt percpu

Sharing one rtcache between CPUs is just a bad idea.

Reviewed by knakahara@
 1.339 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.338 26-Jul-2016  ozaki-r Fix downmatch increment
 1.337 08-Jul-2016  ozaki-r branches: 1.337.2;
CID 1363344: remove dead code

We may need to reconsider a case when m_get_rcvif_psref returns NULL.
 1.336 07-Jul-2016  ozaki-r Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.335 06-Jul-2016  ozaki-r Switch the IPv4 address list to pslist(9)

Note that we leave the old list just in case; it seems there are some
kvm(3) users accessing the list. We can remove it later if we confirmed
nobody does actually.
 1.334 06-Jul-2016  ozaki-r Add and use pslist(9)-based hashtable for IPv4 addresses

Note that we leave the old hashtable to keep vmstat -H working.
 1.333 04-Jul-2016  ozaki-r Separate IP address matching functions

No functional change intended.
 1.332 30-Jun-2016  ozaki-r Tidy up goto lables

No functional change.
 1.331 30-Jun-2016  ozaki-r Fix error paths

Some error paths did m_put_rcvif_psref twice.
 1.330 28-Jun-2016  ozaki-r Add missing NULL checks for m_get_rcvif_psref
 1.329 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.328 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.327 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.326 08-Jan-2016  knakahara eliminate ip_input.c and ip6_input.c dependency on gif(4)
 1.325 13-Oct-2015  roy Include arp.h to restore the sysctl net.inet.ip.dad_count.
Fixes PR kern/49883 thanks to HITOSHI Osada.
 1.324 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.323 07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.322 02-May-2015  joerg Fix !ARP build.
 1.321 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.320 26-Mar-2015  ozaki-r Tidy up the regular path of ip_forward

No functional change is intended.
 1.319 16-Jun-2014  ozaki-r branches: 1.319.2; 1.319.4; 1.319.6; 1.319.10;
Add 3rd argument to pktq_create to pass sc

It will be used to pass bridge sc for bridge_forward softint.

ok rmind@
 1.318 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.317 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.316 29-May-2014  rmind Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.
 1.315 28-May-2014  christos CID 12164{49,51}: Remove bogus ifp == NULL checks; if ifp was really NULL,
we would have been dead a few lines before the tests.
 1.314 23-May-2014  rmind ip_input(), ip_savecontrol(): cache m->m_pkthdr.rcvif in a variable.
 1.313 23-May-2014  rmind Make ip_forward() static, there is no need to expose it.
 1.312 23-May-2014  rmind Make ip_input() static, there is no need to expose it.
 1.311 22-May-2014  rmind - Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.310 19-Mar-2014  liamjfoy branches: 1.310.2;
Remove ipflow_prune and replace with ipflow_reap. ok rmind@
 1.309 25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.308 29-Jun-2013  rmind - Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.307 27-Jun-2013  christos branches: 1.307.2;
flip src/dst
 1.306 27-Jun-2013  christos implement IP_PKTINFO and IP_RECVPKTINFO.
 1.305 08-Jun-2013  rmind Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.
 1.304 05-Jun-2013  christos IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.303 29-Nov-2012  christos Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.
 1.302 25-Jun-2012  christos branches: 1.302.2;
rename rfc6056 -> portalgo, requested by yamt
 1.301 22-Jun-2012  christos PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.300 02-Jun-2012  dsl Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.
 1.299 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.298 09-Jan-2012  liamjfoy branches: 1.298.2; 1.298.6; 1.298.8;
check against NULL
 1.297 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.296 31-Aug-2011  plunky branches: 1.296.2; 1.296.6;
NULL does not need a cast
 1.295 03-May-2011  dyoung *_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
 1.294 14-Apr-2011  dyoung In ipintr(), don't overwrite ipintrq.ifq_maxlen with IFQ_MAXLEN.

Initialize ipintrq.ifq_maxlen using IFQ_MAXLEN directly instead of using
the global ipqmaxlen. Get rid of the global ipqmaxlen.

Now it works again to override the maximum IP queue length with, for
example, sysctl -w net.inet.ip.ifq.maxlen=5.
 1.293 13-Dec-2010  matt branches: 1.293.2;
Back out rev that shouldn't have been committed.
 1.292 11-Dec-2010  matt Add routines to calculate a checkesum if the driver concludes that the
h/w can't do it.
 1.291 05-Nov-2010  rmind ip_randomid: make mechanism MP-safe and more modular.

OK matt@
 1.290 05-Nov-2010  rmind ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.
 1.289 19-Jul-2010  rmind Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@
 1.288 13-Jul-2010  rmind Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@
 1.287 09-Jul-2010  rmind ip_input: move lookup for fragment queue a little bit further. OK matt@.
 1.286 01-Apr-2010  tls As suggested by at least 3 different people (the guilty parties know who
they are) avoid repeated kernel_lock/unlock by using an intrq on the stack.

About 5%-10% better from run to run, on my *very* simpleminded test. Can't
possibly be worse.
 1.285 31-Mar-2010  tls Don't hold kernel lock across call to ip_input() -- it blocked *all*
hardware interrupts for the length of time it took for all dequeued
packets to flow up the stack (on multiprocessors only). Initial testing
shows performance impact is minimal -- since this temporary fix actually
means taking/releasing the kernel lock per-packet, that seems
acceptable.

Holding the kernel lock across the ip_input() call duplicated the
exclusion intended to be provided by the socket locks/softnet lock
(same lock, for INET/INET6 sockets) and could mask serious bugs. Several
hours' testing didn't turn any up but I'd be surprised if some don't now
appear.

Damon Permezel noticed the problem. Temporary fix suggested by matt@.
 1.284 16-Sep-2009  pooka branches: 1.284.2; 1.284.4;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.283 17-Jul-2009  minskim Delete trailing whitespace.
 1.282 16-Jul-2009  minskim Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.
 1.281 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.280 15-Apr-2009  elad Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.
 1.279 18-Mar-2009  cegger bcopy -> memcpy
 1.278 19-Jan-2009  christos branches: 1.278.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.
 1.277 17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.276 23-Nov-2008  rmind ip_input: fix an IPQ "lock" leak. (hi <matt>!)
 1.275 04-Oct-2008  pooka branches: 1.275.2; 1.275.4;
POOL_INIT -> pool_init
 1.274 05-Sep-2008  seanb Wrong route being consulted in one place
in ip_forward() after change to rtcache_*().
Restore previous behaviour.
 1.273 20-Aug-2008  matt Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.
 1.272 05-May-2008  ad branches: 1.272.2; 1.272.6;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.271 04-May-2008  thorpej Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.270 02-May-2008  ad PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.
 1.269 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.268 24-Apr-2008  ad branches: 1.268.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.267 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.266 12-Apr-2008  thorpej branches: 1.266.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.265 09-Apr-2008  thorpej - ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).
 1.264 07-Apr-2008  thorpej Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.
 1.263 27-Mar-2008  cube - Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.
 1.262 06-Feb-2008  matt branches: 1.262.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.
 1.261 14-Jan-2008  dyoung Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().
 1.260 22-Dec-2007  matt Fix offset calculation.
Make sure that all frags use the same TOS.
 1.259 21-Dec-2007  matt Also make sure the first is at 68 bytes long.
 1.258 21-Dec-2007  matt Prevent TCP blind data attacks by not allowing non-initial fragments to
start at less than 68 bytes (minimal fragment size).
 1.257 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.256 26-Nov-2007  yamt branches: 1.256.2; 1.256.6;
inetctlerrmap: use designated initializer.
 1.255 09-Nov-2007  kefren Don't MCLAIM in ipintr() because we do it anyway in ip_input()
 1.254 02-Oct-2007  dyoung branches: 1.254.2; 1.254.4;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.
 1.253 11-Sep-2007  degroote branches: 1.253.2;
In some FAST_IPSEC, spl level is not restored correctly. Fix that.

Spotted by Wolfgang Stukenbrock in pr/36800
 1.252 30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.251 10-Aug-2007  dyoung branches: 1.251.2;
Use sockaddr_dl_init().
 1.250 19-Jul-2007  dyoung branches: 1.250.4; 1.250.6;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.249 02-May-2007  dyoung branches: 1.249.2;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.248 25-Mar-2007  liamjfoy Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.
 1.247 24-Mar-2007  liamjfoy Don't call ip*flow_reap if we're just looking up maxflows
 1.246 12-Mar-2007  ad branches: 1.246.2; 1.246.4;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.245 05-Mar-2007  liamjfoy branches: 1.245.2;
Move ipflow_slowtimo from ip_slowtimo and into in_proto.c

ok matt@
 1.244 04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.243 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.242 29-Jan-2007  dyoung branches: 1.242.2;
Cosmetic: remove extraneous, non-KNF parentheses. Change a
sizeof(type) to a sizeof(*ptr) so the correctness of the statement
is correct "at a glance" (or so I hope).
 1.241 22-Dec-2006  ad ipintr(): check if the queue is empty before looping. Hardly a giant
win, but removed 30% of splnet() calls in one local test.
 1.240 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.239 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.238 06-Dec-2006  dyoung KNF.
 1.237 06-Dec-2006  dyoung KNF.
 1.236 16-Nov-2006  christos branches: 1.236.2; 1.236.4;
__unused removal on arguments; approved by core.
 1.235 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.234 10-Oct-2006  dogcow change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)
 1.233 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.232 19-Sep-2006  elad Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.
 1.231 13-Sep-2006  elad branches: 1.231.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.
 1.230 08-Sep-2006  elad First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.229 30-Aug-2006  christos branches: 1.229.2;
fix initializer
 1.228 30-Jul-2006  elad ugh.. more stuff that's overdue and should not be in 4.0: remove the
sysctl(9) flags CTLFLAG_READONLY[12]. luckily they're not documented
so it's only half regression.

only two knobs used them; proc.curproc.corename (check added in the
existing handler; its CTLFLAG_ANYWRITE, yay) and net.inet.ip.forwsrcrt,
that got its own handler now too.
 1.227 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.226 08-May-2006  liamjfoy branches: 1.226.2;
#if -> #ifdef

ok christos
 1.225 15-Apr-2006  christos Coverity CID 1134: Protect against NULL deref.
 1.224 18-Feb-2006  joerg branches: 1.224.2; 1.224.4; 1.224.6;
Print the source and destination IP in ip_forward's DIAGNOSTIC code
with inet_ntoa, making it more human friendly.

From Liam J. Foy in private mail.
 1.223 24-Dec-2005  perry branches: 1.223.2; 1.223.4; 1.223.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.222 11-Dec-2005  christos merge ktrace-lwp.
 1.221 01-Nov-2005  christos Don't decrement the ttl, until we are sure that we can forward this packet.
Before if there was no route, we would call icmp_error with a datagram
packet that has an incorrect checksum. (From Liam Foy)
 1.220 23-Oct-2005  christos No need to pass an interface when only the mtu is needed. From OpenBSD via
Liam Foy.
 1.219 05-Aug-2005  elad branches: 1.219.2;
Add sysctls for IP, ICMP, TCP, and UDP statistics.
 1.218 28-Jun-2005  seanb branches: 1.218.2;
- Return ICMP_UNREACH_NET when no route found as per
section 4.3.3.1 of rfc1812.
 1.217 09-Jun-2005  atatat Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.
 1.216 01-Jun-2005  blymn Unconstify rnode to prevent compile error when GATEWAY option set.
 1.215 29-Apr-2005  yamt move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.
 1.214 18-Apr-2005  yamt fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.
 1.213 29-Mar-2005  yamt ip_reass: clear stale csum_flags.
 1.212 26-Feb-2005  perry branches: 1.212.2;
nuke trailing whitespace
 1.211 03-Feb-2005  perry ANSIfy function declarations
 1.210 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.209 24-Jan-2005  matt branches: 1.209.2;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.208 19-Dec-2004  christos branches: 1.208.2;
yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.
 1.207 17-Dec-2004  christos Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out
 1.206 15-Dec-2004  thorpej Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.205 06-Oct-2004  darrenr Add a comment to document what setting "srcrt" is really on about in ipintr()
 1.204 29-Sep-2004  christos PR/27081: Sean Boudreau: ip_input() bad csum count not incremented on sw csum
 1.203 25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.202 02-May-2004  darrenr at line 543, we do a pullup here of hlen bytes into the mbuf,
so these later ones are superfluous.
 1.201 01-May-2004  matt Use EVCNT_ATTACH_STATIC{,2}
 1.200 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.199 22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.198 01-Apr-2004  matt In ip_reass_ttl_descr, make i signed since it's compared to >= 0
 1.197 24-Mar-2004  atatat branches: 1.197.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.196 15-Jan-2004  itojun correct typo in 1.94 -> 1.95. pointed out by Shiva Shenoy
 1.195 14-Dec-2003  thorpej Fix syntax errors in CHECK_NMBCLUSTER_PARAMS().
 1.194 14-Dec-2003  jonathan Second part of hashed IP_reassembly changes:

When under pressure for mbufs or we have too many fragments in the IP
reassembly queue, drop half of all fragments. This multiplicative-drop
strategy ensures we return to a healthy state, even under borderline
denial-of-service from extremely lossy NFS-over-UDP peers.
The multiplicative-drop phase currently drops 50% of fragments, but
has pre-placed support for implementing drop-fractions other than 50%

The threshhold for the `drop-half' phase is the new variable,
ip_maxfrags which is calculated as nmbclusters/4.

ip_input.c now keeps ip_nmbclusters, a cached copy of nmbclusters.
Before using limits derived from nmbclusters, we check if nmbclusters
and ip_nmclusters are equal. If not, we recompute Ip parameters
derived from nmbclusters. Based on a suggestion by Jason Thorpe.
ip_maxfrags is currently auto-recalcuated.

The counters ip_nfrags and ip_nfragpacketsr are now declared static
and uninitialized (bss), to discourage tampering with them.
 1.193 12-Dec-2003  scw Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.
 1.192 08-Dec-2003  jonathan Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.
 1.191 07-Dec-2003  jonathan KNF: s/unsigned/u_int/, in a couple of places I missed.
 1.190 06-Dec-2003  jonathan Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.
 1.189 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.188 04-Dec-2003  scw ipflow (IP fast forwarding) is not compatible with FAST_IPSEC either.

XXX: The decision whether or not to fast forward should be made
XXX: dynamically. Using the current approach seriously reduces
XXX: routing performance on gateways with IPsec enabled.
 1.187 26-Nov-2003  itojun define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.
 1.186 24-Nov-2003  scw For FAST_IPSEC, ipfilter gets to see wire-format IPsec-encapsulated packets
only. Decapsulated packets bypass ipfilter. This mimics current behaviour
for Kame IPsec.
 1.185 19-Nov-2003  fvdl Correct number of arguments to sysctl_rdint.
 1.184 19-Nov-2003  jonathan Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.
 1.183 17-Nov-2003  jonathan Diff to netinet/ip_input.c (restore ip_id, initialize) for ip_id fix:

Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
 1.182 12-Nov-2003  itojun KNF
 1.181 11-Nov-2003  jonathan Change global head-of-local-IP-address list from in_ifaddr to
in_ifaddrhead. Recent changes in struct names caused a namespace
collision in fast-ipsec, which are most cleanly fixed by using
"in_ifaddrhead" as the listhead name.
 1.180 10-Nov-2003  jonathan Make per-protocol network input queue stats visible to userland via
sysctl. Add a protocol-independent sysctl handler to show the per-protocol
"struct ifq' statistics. Add IP(v4) specific call to the handler.
Other protocols can show their per-protocol input statistics by
allocating a sysclt node and calling sysctl_ifq() with their own struct ifq *.

As posted to tech-kern plus improvements/cleanup suggested by Andrew Brown.
 1.179 28-Sep-2003  mycroft Remove some code that breaks AH tunnels completely. The comment describing
the purpose of this code appears to be on crack -- it's talking about
end-to-end authentication, but the purpose of an AH tunnel is NOT end-to-end
authentication; it's authentication of the tunnel endpoints.

NB: This does not fix the fact that IPsec leaks "packet tags."
 1.178 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.177 06-Sep-2003  itojun backout previous, we don't know if arc4random() corrides on reboot.
 1.176 05-Sep-2003  itojun initialize fragment ID with arc4random, not by time.tv_sec
 1.175 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.174 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.173 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.172 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.171 14-Jul-2003  itojun correct igmp. from love
 1.170 03-Jul-2003  itojun minor KNF
 1.169 30-Jun-2003  itojun branches: 1.169.2;
do not generate ICMP redirect when packet filter alters ip_dst to an
address that reside on the same link. Cedric Berger convinced me that
it is necessary.
 1.168 30-Jun-2003  itojun fix indent
 1.167 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.166 15-Jun-2003  matt Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.165 11-Apr-2003  christos PR/991: Darren Reed: Add a sysctl (checkinteface) to implement this. This
implementation is taken from FreeBSD, but we default to off.
XXX: We should really do this on a per ifaddr basis as jason suggested.
 1.164 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.163 12-Nov-2002  itojun remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.
 1.162 12-Nov-2002  itojun ckout previous - doesn't compile
 1.161 12-Nov-2002  itojun update ip_mtudisc sysctl change handling.
 1.160 10-Nov-2002  itojun always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se
 1.159 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.158 23-Sep-2002  itojun revert mtudisc_timeout value to the old one if update falis
 1.157 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.156 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.155 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.154 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.153 13-Jun-2002  itojun set IPv4 parameter to modern value.
- turn on path MTU discovery (previous: turned off)
- ICMPv4 redirect entry timeout = 600 sec (previous: never timeout)
 1.152 09-Jun-2002  itojun whitespace
 1.151 07-Jun-2002  itojun look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>
 1.150 12-May-2002  matt branches: 1.150.2; 1.150.4;
Eliminate commons.
 1.149 12-May-2002  wiz Spelling fixes, from Sergey Svishchev in kern/16650.
 1.148 07-May-2002  matt Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.
 1.147 18-Apr-2002  matt Change test for M_EXT to M_READONLY for MROUTING. We only need to to do
a pullup if we aren't allowed to modify the packet.
 1.146 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.145 25-Feb-2002  itojun correctly enforce ipsec policy check on forwarding case.
From: Greg Troxel <gdt@ir.bbn.com>, Bill Chiarchiaro <wjc@work.cleartech.com>
 1.144 24-Feb-2002  martin Clear M_BCAST and M_MCAST on outgoing mbufs.
Don't copy ttl from the inner packet to the encapsulating packet. Make
the outer ttl sysctl'able. This should close PR 14269 from Jasper Wallace
(change partly from there) and it makes traceroute work over gre tunnels.
 1.143 21-Feb-2002  itojun suppress source quence message, based on router-req RFC (also could be abused
as DoS traffic generator). from kjc/kame
 1.142 28-Nov-2001  darrenr recompute hlen after calling pfil_run_hooks() in case ip_hl was changed.
 1.141 13-Nov-2001  lukem add RCSIDs
 1.140 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.139 04-Nov-2001  matt Change a few variable/tables to const since they are read-only.
 1.138 29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.137 17-Sep-2001  thorpej branches: 1.137.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.
 1.136 06-Aug-2001  itojun branches: 1.136.2;
cache IPsec policy on in6?pcb. most of the lookup operations can be bypassed,
especially when it is a connected SOCK_STREAM in6?pcb. sync with kame.
 1.135 02-Jun-2001  thorpej branches: 1.135.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.134 21-May-2001  lukem fix spelo in comment
 1.133 16-Apr-2001  itojun give a default value to net.inet.ip.maxfragpackets, to protect us from
"lots of fragmented packets" DoS attack.

the current default value is derived from ipv6 counterpart, which is
a magical value "200". it should be enough for normal systems, not sure
if it is enough when you take hundreds of thousands of tcp connections on
your system. if you have proposal for a better value with concrete reasons,
let me know.
 1.132 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.131 27-Mar-2001  itojun net.inet.ip.maxfragpackets defines the maximum size of ip reass queue
(prevents fragment flood from chewing up mbuf memory space).
derived from KAME net.inet6.ip6.maxfragpackets.
 1.130 02-Mar-2001  itojun branches: 1.130.2;
increase ipstat.ips_badaddr if the packet fails to pass address checks.
 1.129 02-Mar-2001  itojun reject packets with 127/8 on IPv4 src/dst, they must not appear on wire
(RFC1122). torture-tests will be welcomed.
XXX do we want to check source routing headers as well?
 1.128 01-Mar-2001  itojun make sure to enforce inbound ipsec policy checking, for any protocols on top
of ip (check it when final header is visited). sync with kame.
XXX kame team will need to re-check policy engine code
 1.127 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.126 28-Dec-2000  thorpej Back out the sledgehammer damage applied by wiz while I was out for
the holiday.
 1.125 25-Dec-2000  wiz Back out previous change. It causes NAT to fail, and was CLEARLY
NOT TESTED before it was committed.
 1.124 22-Dec-2000  thorpej Slight adjustment to how pfil_head's are registered. Instead of a
"key" and a "dlt", use a "type" (PFIL_TYPE_{AF,IFNET} for now) and
a val/ptr appropriate for that type. This allows for more future
flexibility with the pfil_hook mechanism.
 1.123 14-Dec-2000  thorpej Add ALTQ glue. XXX Temporary until ALTQ is changed to use a pfil hook.
 1.122 24-Nov-2000  itojun IFA_STATS stability (not complete); don't touch ip if it is NULL.
 1.121 11-Nov-2000  thorpej Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.
 1.120 08-Nov-2000  ad Update for hashinit() change.
 1.119 13-Oct-2000  itojun make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.
 1.118 26-Aug-2000  itojun make sure anonport{min,max} is not negative number
 1.117 25-Aug-2000  tron Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.
 1.116 06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.115 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.114 10-May-2000  itojun branches: 1.114.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.
 1.113 10-May-2000  itojun correct more out-of-bounds memory access, if cnt == 1 and optlen > 1.
 1.112 06-May-2000  sommerfeld Handle large offsets with very small options correctly.
 1.111 31-Mar-2000  jdolecek Slighly improve previous - only include <netinet/ip_mroute.h> if MROUTING
is defined.
 1.110 31-Mar-2000  jdolecek include <netinet/ip_mroute.h> for ip_mforward() - needed after
last duplicate prototype sweep (prototype for ip_mforward() used to be in <netinet/ip_var.h>)
 1.109 30-Mar-2000  augustss Remove register declarations.
 1.108 30-Mar-2000  simonb Delete uninitialised declaration of ip_defttl - there's an initialised
decl earlier in this file.
 1.107 10-Mar-2000  thorpej Back out previous, and adjust a comment.
 1.106 07-Mar-2000  thorpej Back out part of 1.104 which isn't actually needed.
 1.105 03-Mar-2000  itojun remove unnecessary ttl initialization which I mistakingly bringed in
during KAME merge (this is part of WIDE's expeirmental reass code...)
NetBSD PR: 9412
From: Wolfgang Rupprecht <wolfgang@wsrcc.com>
Fix from: ho@crt.se
itojun was notified from: theo
 1.104 02-Mar-2000  thorpej Avoid a bug in GCC which manifests itself when processing unaligned
IP options. Problem pointed out by Matt Hargett and Erik Fair, analyzed
by me.
 1.103 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.102 20-Feb-2000  darrenr pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".
 1.101 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.100 16-Feb-2000  itojun - if ip_dst matches address on !IFF_UP interface, and
- there's no match against addresses on IFF_UP interface,
send icmp unreach if I'm router. drop it if I'm host.

Revised version of PR: 9387 from nrt@iij.ad.jp. Discussed with thorpej+nrt.
 1.99 12-Feb-2000  thorpej Typo (Thanks, Havard :-)
 1.98 12-Feb-2000  thorpej Small cosmetic change, and note a place where a statistic should be
gathered.
 1.97 11-Feb-2000  itojun fix in-kernel packet forwarding loop (till TTL becomes 0) when:
- a packet is delivered to an address X,
- and the address X is configured on my !IFF_UP interface
- and ipforwarding=1

NetBSD PR: 9387
From: nrt@iij.ad.jp
 1.96 01-Feb-2000  thorpej Use ifatoia() and sintosa() consistently, rather than using home-grown
casting macros intermixed.
 1.95 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.94 26-Oct-1999  itojun disable ipflow (IPv4 fast fowarding) when IPsec is configured into the kernel.
 1.93 17-Oct-1999  sommerfeld branches: 1.93.2; 1.93.4;
In ip_forward():

Avoid forwarding ip unicast packets which were contained inside
link-level multicast packets; having M_MCAST still set in the packet
header flags will mean that the packet will get multicast to a bogus
group instead of unicast to the next hop.

Malformed packets like this have occasionally been spotted "in the
wild" on a mediaone cable modem segment which also had multiple netbsd
machines running as router/NAT boxes.

Without this, any subnet with multiple netbsd routers receiving all
multicasts will generate a packet storm on receipt of such a
multicast. Note that we already do the same check here for link-level
broadcasts; ip6_forward already does this as well.

Note that multicast forwarding does not go through ip_forward().

Adding some code to if_ethersubr to sanity check link-level
vs. ip-level multicast addresses might also be worthwhile.
 1.92 23-Jul-1999  itojun branches: 1.92.2;
do not include unnecessary include files.
 1.91 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.90 06-Jul-1999  itojun sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.89 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.88 26-Jun-1999  sommerfeld If the new global variable hostzerobroadcast is zero, no longer assume
address zero of each net/subnet is a broadcast address.
(The default value is nonzero, which preserves the current behavior).

This can be set using sysctl; the boot-time default can also be
configured using the HOSTZEROBROADCAST kernel config option.

While we're here, defopt HOSTZEROBROADCAST and SUBNETSARELOCAL
 1.87 04-May-1999  hwr It does not make much sense to increase a "output" counter on input.
 1.86 03-May-1999  thorpej In INADDR_TO_IA(), skip interfaces which are not up. Revert previous change
to ip_input.c to check the interface status after INADDR_TO_IA().

Fix cooked up by Heiko Rupp and myself.

Fixes PR 7480.
 1.85 03-May-1999  hwr Drop packets, that have a Class-D address as source address.
Implements the first half of PR 7003.
 1.84 07-Apr-1999  proff tiny KNF change
 1.83 07-Apr-1999  proff Prevent reception of packets on downed interfaces (via an up interface).
fixes kern/7327
 1.82 27-Mar-1999  aidan branches: 1.82.2;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.
 1.81 26-Mar-1999  proff security: test for ip_len < ip_hl <<2 and drop packet accordingly
 1.80 19-Jan-1999  mycroft There's just no plausible reason to byte-swap ip_id internally. It's opaque.
 1.79 19-Jan-1999  mycroft Don't screw with ip_len; just subtract from it where we actually use the
value.
 1.78 19-Jan-1999  mycroft Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.
 1.77 11-Jan-1999  thorpej Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.
 1.76 19-Dec-1998  thorpej Reverse the copyright-notice-swap. It went against existing practice.
 1.75 18-Dec-1998  thorpej Add a lock around the IP fragment reassembly queue, to prevent ip_drain()
from corrupting the queue if called from a device's interrupt context.

Should fix PR #5684.
 1.74 13-Nov-1998  thorpej branches: 1.74.2;
Once a fragmented IP packet has been reassembled, recompute the packet
length before passing it up the stack. From FreeBSD.
 1.73 08-Oct-1998  thorpej Use the pool allocator for ipflow entries.
 1.72 08-Oct-1998  thorpej Use the pool allocator for ipqent structures.
 1.71 30-Sep-1998  tls Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.
 1.70 09-Sep-1998  thorpej Make a diagnostic printf more sensible, PR #5951, Heiko W. Rupp.
 1.69 09-Aug-1998  mrg defopt PFIL_HOOKS.
 1.68 17-Jul-1998  sommerfe Fix PR5508: ipfil cut-through forwarding causes panic
 1.67 01-Jun-1998  thorpej Protect the ipflow_reap() call with splsoftnet.
 1.66 24-May-1998  thorpej Fix OBOB in IP timestamp option processing, as noted in FreeBSD PR 6738,
from Jennifer Dawn Meyers <jdm@enteract.com>.
 1.65 04-May-1998  matt Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.
 1.64 01-May-1998  thorpej Allow packet filters to prevent a packet from creating a fast-forwarding
flow, by setting the "can fast forward" flag in the packet header, and
giving a chance for filters to clear the flag. If the flag is still
set after the filters have given it a chance, the packet will be used
to create a fast-forward flow entry.
 1.63 29-Apr-1998  matt Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).
 1.62 29-Apr-1998  matt defopt GATEWAY
 1.61 29-Apr-1998  kml change path MTU timeout value to match RFC 1191
 1.60 29-Apr-1998  kml Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.
 1.59 19-Mar-1998  mrg convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.
 1.58 15-Feb-1998  tls Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.
 1.57 13-Feb-1998  tls Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.
 1.56 28-Jan-1998  thorpej Use offsetof() from libkern.h
 1.55 12-Jan-1998  scottr Use option header file for MROUTING
 1.54 05-Jan-1998  lukem enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}
 1.53 18-Oct-1997  kml branches: 1.53.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc
 1.52 17-Oct-1997  thorpej Allow `subnetsarelocal' to be changed via sysctl.
 1.51 29-Aug-1997  gwr Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)
 1.50 24-Jun-1997  thorpej branches: 1.50.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.
 1.49 15-Apr-1997  christos Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?
 1.48 26-Feb-1997  mrg allow src-routed packetd by default, per host requirements
 1.47 25-Feb-1997  cjs Add net.inet.ip.allowsrcrt option which allows/drops all source
routed packets. This currently defaults to `drop,' but once we
verify that all applications that rely on determining remote IP
addresses for authentication are dropping the connection when they
see a source route option (not just disabling the source route
option), we can turn this back on and conform with the host
requirements.
 1.46 19-Feb-1997  cjs Fix bug in sysctl net.inet.ip.forwsrcrt handing: now you can read it
if securelevel > 0. (Thanks, cgd.)
 1.45 18-Feb-1997  mrg pseudo-device ipfilter brings in PFIL_HOOKS.
 1.44 11-Jan-1997  thorpej branches: 1.44.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.
 1.43 20-Dec-1996  mrg in pfil_hooks: always reassign ip after calling hook.
 1.42 20-Dec-1996  mrg remove pfil_bad.
 1.41 25-Oct-1996  thorpej Before concatenating frags, sanity check the length of the packet. If it's
larger than IP_MAXPACKET, discard it.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>
 1.40 22-Oct-1996  veego Fix a panic from the pfil_hooks.
 1.39 13-Oct-1996  christos backout previous kprintf changes
 1.38 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.37 21-Sep-1996  perry commit fix in pr 2772 -- the IP input code was assuming that the
reserved (must be zero) flag must necessarily be zero. We now define
an IP_RF (by analogy to IP_DF and IP_MF) and mask it out when necessary.
 1.36 14-Sep-1996  mrg move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.
 1.35 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.34 08-Sep-1996  mycroft Save 68 bytes of the packet for ICMP, not 64. From Laine Stump, PR 2296.
 1.33 06-Sep-1996  mrg add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.
 1.32 14-Aug-1996  thorpej Fix some DIAGNOSTIC printf() formats; ntohl() provides a 32-bit quantity,
and should be printed with %x, not %lx.
 1.31 10-Jul-1996  cgd print result of ntohl/htonl as a long. (makes -Wformat work on the
Alpha.)
 1.30 16-Mar-1996  christos branches: 1.30.4;
Fix printf format args.
 1.29 26-Feb-1996  mrg two more local addr changes, all done differently now (idea from charles)
 1.28 13-Feb-1996  christos netinet prototypes
 1.27 16-Jan-1996  thorpej Add a net.inet.ip.directed-broadcast sysctl as suggested by
Darren Reed <darrenr@vitruvius.arbld.unimelb.edu.au> in PR #1227.
This change is slightly different than the one submitted by Darren in
that the DIRECTED_BROADCAST compile-time option will behave like it used
to so that existing configurations utilizing it won't have to change.
 1.26 15-Jan-1996  thorpej Add net.inet.ip.forwsrcrt: if zero, the system will not forward
source-routed packets. Note this value is protected by kernel security
level; it can only be changed if securelevel < 1.
 1.25 21-Nov-1995  cgd make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.
 1.24 12-Aug-1995  mycroft splnet --> splsoftnet
 1.23 12-Jun-1995  mycroft Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.
 1.22 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.21 07-Jun-1995  mycroft Remove ip_ifmatrix completely.
 1.20 04-Jun-1995  mycroft Don't cast things unnecessarily.
 1.19 04-Jun-1995  mycroft Clean up many more casts.
 1.18 01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.17 15-May-1995  cgd oops; forgot a '{'
 1.16 14-May-1995  cgd drop (and record) malformed IP fragments. Fixes pr 1030 (differently).
 1.15 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.14 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.13 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.12 14-Feb-1994  mycroft PARANOID --> DIAGNOSTIC for inexpensive tests.
 1.11 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.10 29-Jan-1994  brezak Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.
 1.9 10-Jan-1994  mycroft Should compile now with or without `options MULTICAST'.
 1.8 09-Jan-1994  mycroft Prototype the rest.
 1.7 08-Jan-1994  mycroft More prototypes.
 1.6 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.5 18-Dec-1993  mycroft Canonicalize all #includes.
 1.4 06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.3 20-May-1993  cgd branches: 1.3.4;
more rcsid additions and file header cleanups
 1.2 04-May-1993  cgd make ip_input recursion checking be for -DPARANOID, and make it panic
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.2 14-Nov-1993  mycroft PARANOID --> DIAGNOSTIC. These are not expensive tests.
 1.3.4.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.30.4.3 11-Dec-1996  mycroft From trunk:
Save 68 bytes of the packet for ICMP, not 64.
 1.30.4.2 11-Dec-1996  mycroft From trunk:
Ignore the reserved fragment flag when checking ip_off.
 1.30.4.1 10-Nov-1996  thorpej Update from trunk:
- Make ip_len and ip_off unsigned.
- Make sure we don't accept or transmit packets larger than the
maximim IP packet size.
This fixes the so-called `death ping' bug.

Sum of work from Bill Fenner <fenner@parc.xerox.com>,
Kevin Lahey <kml@nas.nasa.gov>, and myself.

Thanks to Curt Sampson, Jukka Marin, and Kevin Lahey for testing
this under NetBSD 1.2
 1.44.4.1 12-Mar-1997  is Merge in changes from Trunk
 1.50.4.1 01-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.53.2.4 15-Nov-1998  cgd pull up rev 1.74 from trunk (thorpej)
 1.53.2.3 01-Oct-1998  cgd pull up revisions 1.57-1.58 (via patch), 1.71 (via patch) from trunk. (tls)
 1.53.2.2 22-Jul-1998  mellon Pull up 1.59 and 1.68 (veego)
 1.53.2.1 09-May-1998  mycroft Pull up patch from kml.
 1.74.2.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.82.2.7 30-May-2001  he Pull up revisions 1.131,1.133 (via patch, requested by he):
Introduce net.inet.ip.maxfragpackets, which controls the maximum
number of IPv4 fragment reassembly queue entries. Defends against
certain DoS attacks. Fixes SA#2001-006.
 1.82.2.6 06-May-2000  he Pull up revision 1.112 (requested by sommerfeld):
Handle large offsets inside very small options correctly.
 1.82.2.5 02-Mar-2000  he Pull up revision 1.104 (requested by thorpej):
Work around a compiler bug that causes a security vulnerability
in our IP stack on some platforms.
 1.82.2.4 12-Feb-2000  he Apply patch (requested by thorpej):
Adhere to RFC 1112 and RFC 1122 by dropping incoming packets with
a multicast source address. Fixes part of PR#7003.
 1.82.2.3 17-Oct-1999  cgd pull up rev 1.93 from trunk (requested by sommerfeld):
Multicast storm prevention: don't attempt to forward link-level
multicast packets which contain ip unicast packets; these packets
would only be generated from misconfigured/buggy systems.
 1.82.2.2 03-May-1999  perry branches: 1.82.2.2.2; 1.82.2.2.4;
pullup 1.85->1.86 (thorpej)
 1.82.2.1 07-Apr-1999  proff pullup 1.82 - 1.83; don't receive packets on downed interface addresses
 1.82.2.2.4.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.82.2.2.4.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.82.2.2.4.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.82.2.2.2.3 02-Aug-1999  thorpej Update from trunk.
 1.82.2.2.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.82.2.2.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.92.2.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.93.4.1 15-Nov-1999  fvdl Sync with -current
 1.93.2.8 21-Apr-2001  bouyer Sync with HEAD
 1.93.2.7 27-Mar-2001  bouyer Sync with HEAD.
 1.93.2.6 12-Mar-2001  bouyer Sync with HEAD.
 1.93.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.93.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.93.2.3 08-Dec-2000  bouyer Sync with HEAD.
 1.93.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.93.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.114.4.10 13-Nov-2002  itojun sys/net/route.c 1.55 via patch
sys/net/route.h 1.32
sys/netinet/ip_input.c 1.163

Remove all entries on rt timer queue on ip_mtudisc change, instead
of destroying the queue.

(itojun, redo)
 1.114.4.9 10-Nov-2002  itojun sys/netinet/ip_input.c 1.160 via patch

Always create PMTUD timeout queue, as PMTUD can be turned on via
sysctl at runtime. From lha@stacken.kth.se.

(itojun)
 1.114.4.8 26-Feb-2002  he Pull up revision 1.145 (requested by itojun):
Correctly enforce ipsec policy check in IPv4 forwarding case.
 1.114.4.7 26-Feb-2002  he Pull up revision 1.144 (requested by martin):
Clear M_BCAST and M_MCAST on encapsulated packets on outgoing
mbufs. Also do not copy TTL from the inner packet, and make the
outer TTL sysctl'able. Fixes PR#14269, and makes traceroute work
over GRE tunnels.
 1.114.4.6 24-Apr-2001  he Pull up revisions 1.131,1.133 (requested by itojun):
Introduce net.inet.ip.maxfragpackets, which controls the maximum
number of IPv4 fragment reassembly queue entries. Defends against
certain DoS attacks.
 1.114.4.5 06-Apr-2001  he Pull up revision 1.127 (via patch, requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.114.4.4 11-Mar-2001  he Pull up revision 1.128 (requested by itojun):
Ensure that we enforce inbound IPsec policy on all IP protocols,
not just TCP, UDP and ICMP.
 1.114.4.3 17-Oct-2000  tv Pullup 1.119 [itojun]:
make sure we don't share external mbuf between m and mcopy, in ip_forward().
should solve PR 11201.
 1.114.4.2 27-Aug-2000  itojun pullup 1.117 -> 1.118 (approved by releng-1-5)

> make sure anonport{min,max} is not negative number
 1.114.4.1 26-Aug-2000  tron Pull up from current (approved by thorpej):

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.

syssrc/sys/netinet/in.h 1.49 -> 1.50
syssrc/sys/netinet/in_pcb.c 1.66 -> 1.67
syssrc/sys/netinet/ip_input.c 1.116 -> 1.117
syssrc/sys/netinet/ip_var.h 1.41 -> 1.42
 1.130.2.16 11-Dec-2002  thorpej Sync with HEAD.
 1.130.2.15 11-Nov-2002  nathanw Catch up to -current
 1.130.2.14 18-Oct-2002  nathanw Catch up to -current.
 1.130.2.13 17-Sep-2002  nathanw Catch up to -current.
 1.130.2.12 27-Aug-2002  nathanw Catch up to -current.
 1.130.2.11 01-Aug-2002  nathanw Catch up to -current.
 1.130.2.10 20-Jun-2002  nathanw Catch up to -current.
 1.130.2.9 04-May-2002  thorpej Update from trunk.
 1.130.2.8 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.130.2.7 28-Feb-2002  nathanw Catch up to -current.
 1.130.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.130.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.130.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.130.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.130.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.130.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.135.2.6 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.135.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.135.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.135.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.135.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.135.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.136.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.137.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.150.4.3 17-Jun-2003  msaitoh Pullup rev. 1.163 via patch (requested by itojun in ticket #984):
remove all entries in rt timer queue on ip_mtudisc change, instead of
destroying the queue.
 1.150.4.2 12-Nov-2002  tron Pull up revision 1.160 (requested by itojun in ticket #977):
always create pmtud timeout queue, as ip_mtudisc can be tweaked via
sysctl at runtime. From lha@stacken.kth.se
 1.150.4.1 07-Jun-2002  thorpej pullup-1-6 ticket #202:

syssrc/sys/netinet/ip_input.c 1.151

Original log message:

look at rmx_mtu on IPsec tunnel MTU computation.
From: David Waitzman <djw@bbn.com>
 1.150.2.3 29-Aug-2002  gehenna catch up with -current.
 1.150.2.2 15-Jul-2002  gehenna catch up with -current.
 1.150.2.1 20-Jun-2002  gehenna catch up with -current.
 1.169.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.169.2.7 01-Apr-2005  skrll Sync with HEAD.
 1.169.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.169.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.169.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.169.2.3 18-Dec-2004  skrll Sync with HEAD.
 1.169.2.2 19-Oct-2004  skrll Sync with HEAD
 1.169.2.1 03-Aug-2004  skrll Sync with HEAD
 1.197.2.1 28-May-2004  tron Pull up revision 1.203 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.208.2.1 29-Apr-2005  kent sync with -current
 1.209.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.209.2.1 12-Feb-2005  yamt sync with head.
 1.212.2.3 17-Sep-2007  bouyer Pull up following revision(s) (requested by degroote in ticket #1840):
sys/netinet/ip_input.c: revision 1.253
In some FAST_IPSEC, spl level is not restored correctly. Fix that.
Spotted by Wolfgang Stukenbrock in pr/36800
 1.212.2.2 06-May-2005  tron branches: 1.212.2.2.2; 1.212.2.2.4;
Pull up revision 1.214 (requested by yamt in ticket #251):
fix problems related to loopback interface checksum omission. PR/29971.
- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)
ok'ed by Jason Thorpe.
 1.212.2.1 04-Apr-2005  tron Pull up revision 1.213 (requested by yamt in ticket #88):
ip_reass: clear stale csum_flags.
 1.212.2.2.4.1 17-Sep-2007  bouyer Pull up following revision(s) (requested by degroote in ticket #1840):
sys/netinet/ip_input.c: revision 1.253
In some FAST_IPSEC, spl level is not restored correctly. Fix that.
Spotted by Wolfgang Stukenbrock in pr/36800
 1.212.2.2.2.1 17-Sep-2007  bouyer Pull up following revision(s) (requested by degroote in ticket #1840):
sys/netinet/ip_input.c: revision 1.253
In some FAST_IPSEC, spl level is not restored correctly. Fix that.
Spotted by Wolfgang Stukenbrock in pr/36800
 1.218.2.9 11-Feb-2008  yamt sync with head.
 1.218.2.8 21-Jan-2008  yamt sync with head
 1.218.2.7 07-Dec-2007  yamt sync with head
 1.218.2.6 15-Nov-2007  yamt sync with head.
 1.218.2.5 27-Oct-2007  yamt sync with head.
 1.218.2.4 03-Sep-2007  yamt sync with head.
 1.218.2.3 26-Feb-2007  yamt sync with head.
 1.218.2.2 30-Dec-2006  yamt sync with head.
 1.218.2.1 21-Jun-2006  yamt sync with head.
 1.219.2.2 02-Nov-2005  yamt sync with head.
 1.219.2.1 26-Oct-2005  yamt sync with head
 1.223.6.3 01-Jun-2006  kardel Sync with head.
 1.223.6.2 22-Apr-2006  simonb Sync with head.
 1.223.6.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.223.4.1 09-Sep-2006  rpaulo sync with head
 1.223.2.1 01-Mar-2006  yamt sync with head.
 1.224.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.224.4.2 11-May-2006  elad sync with head
 1.224.4.1 19-Apr-2006  elad sync with head.
 1.224.2.5 14-Sep-2006  yamt sync with head.
 1.224.2.4 03-Sep-2006  yamt sync with head.
 1.224.2.3 11-Aug-2006  yamt sync with head
 1.224.2.2 26-Jun-2006  yamt sync with head.
 1.224.2.1 24-May-2006  yamt sync with head.
 1.226.2.1 19-Jun-2006  chap Sync with head.
 1.229.2.3 01-Feb-2007  ad Sync with head.
 1.229.2.2 12-Jan-2007  ad Sync with head.
 1.229.2.1 18-Nov-2006  ad Sync with head.
 1.231.2.3 18-Dec-2006  yamt sync with head.
 1.231.2.2 10-Dec-2006  yamt sync with head.
 1.231.2.1 22-Oct-2006  yamt sync with head
 1.236.4.2 03-Jun-2008  skrll Sync with netbsd-4.
 1.236.4.1 23-Sep-2007  wrstuden Sync with somewhat-recent netbsd-4.
 1.236.2.2 30-Mar-2008  jdc Pull up revisions:
src/sys/netinet/ip_input.c 1.263
src/sys/netinet/tcp_subr.c 1.225
(requested by cube in ticket #1109).

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.
 1.236.2.1 16-Sep-2007  xtraeme branches: 1.236.2.1.4;
Pull up following revision(s) (requested by degroote in ticket #881):
sys/netinet/ip_input.c: revision 1.253
sys/netinet6/ip6_input.c: revision 1.110

In some FAST_IPSEC, spl level is not restored correctly. Fix that.
Spotted by Wolfgang Stukenbrock in pr/36800
 1.236.2.1.4.1 30-Mar-2008  jdc Pull up revisions:
src/sys/netinet/ip_input.c 1.263
src/sys/netinet/tcp_subr.c 1.225
(requested by cube in ticket #1109).

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.
 1.242.2.5 07-May-2007  yamt sync with head.
 1.242.2.4 15-Apr-2007  yamt sync with head.
 1.242.2.3 24-Mar-2007  yamt sync with head.
 1.242.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.242.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.245.2.5 09-Oct-2007  ad Sync with head.
 1.245.2.4 20-Aug-2007  ad Sync with HEAD.
 1.245.2.3 08-Jun-2007  ad Sync with head.
 1.245.2.2 10-Apr-2007  ad Sync with head.
 1.245.2.1 13-Mar-2007  ad Sync with head.
 1.246.4.1 29-Mar-2007  reinoud Pullup to -current
 1.246.2.1 11-Jul-2007  mjf Sync with head.
 1.249.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.249.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.250.6.2 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.250.6.1 19-Jul-2007  dyoung file ip_input.c was added on branch matt-mips64 on 2007-07-19 20:48:56 +0000
 1.250.4.6 27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.250.4.5 11-Nov-2007  joerg Sync with HEAD.
 1.250.4.4 04-Oct-2007  joerg Sync with HEAD.
 1.250.4.3 02-Oct-2007  joerg Sync with HEAD.
 1.250.4.2 03-Sep-2007  jmcneill Sync with HEAD.
 1.250.4.1 16-Aug-2007  jmcneill Sync with HEAD.
 1.251.2.3 23-Mar-2008  matt sync with HEAD
 1.251.2.2 09-Jan-2008  matt sync with HEAD
 1.251.2.1 06-Nov-2007  matt sync with HEAD
 1.253.2.1 06-Oct-2007  yamt sync with head.
 1.254.4.4 18-Feb-2008  mjf Sync with HEAD.
 1.254.4.3 27-Dec-2007  mjf Sync with HEAD.
 1.254.4.2 08-Dec-2007  mjf Sync with HEAD.
 1.254.4.1 19-Nov-2007  mjf Sync with HEAD.
 1.254.2.1 13-Nov-2007  bouyer Sync with HEAD
 1.256.6.2 19-Jan-2008  bouyer Sync with HEAD
 1.256.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.256.2.1 26-Dec-2007  ad Sync with head.
 1.262.6.5 17-Jan-2009  mjf Sync with HEAD.
 1.262.6.4 05-Oct-2008  mjf Sync with HEAD.
 1.262.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.262.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.262.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.266.2.1 18-May-2008  yamt sync with head.
 1.268.2.6 11-Aug-2010  yamt sync with head.
 1.268.2.5 11-Mar-2010  yamt sync with head
 1.268.2.4 19-Aug-2009  yamt sync with head.
 1.268.2.3 18-Jul-2009  yamt sync with head.
 1.268.2.2 04-May-2009  yamt sync with head.
 1.268.2.1 16-May-2008  yamt sync with head.
 1.272.6.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.272.6.1 19-Oct-2008  haad Sync with HEAD.
 1.272.2.2 10-Oct-2008  skrll Sync with HEAD.
 1.272.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.275.4.1 25-Nov-2008  snj branches: 1.275.4.1.8;
Pull up following revision(s) (requested by rmind in ticket #119):
sys/netinet/ip_input.c: revision 1.276
ip_input: fix an IPQ "lock" leak. (hi <matt>!)
 1.275.4.1.8.2 07-Jan-2011  matt Backout an inadverdant change.
 1.275.4.1.8.1 07-Jan-2011  matt If using hardware checksum offload and the packet can't be h/w checksumed
(for whatever reason, some hardware is stupid) allow the driver to calculate
the checksum instead.
 1.275.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.275.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.278.2.2 23-Jul-2009  jym Sync with HEAD.
 1.278.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.284.4.4 31-May-2011  rmind sync with head
 1.284.4.3 21-Apr-2011  rmind sync with head
 1.284.4.2 05-Mar-2011  rmind sync with head
 1.284.4.1 30-May-2010  rmind sync with head
 1.284.2.3 06-Nov-2010  uebayasi Sync with HEAD.
 1.284.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.284.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.293.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.296.6.2 05-Apr-2012  mrg sync to latest -current.
 1.296.6.1 18-Feb-2012  mrg merge to -current.
 1.296.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.296.2.3 16-Jan-2013  yamt sync with (a bit old) head
 1.296.2.2 30-Oct-2012  yamt sync with head
 1.296.2.1 17-Apr-2012  yamt sync with head
 1.298.8.1 09-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1526):
sys/netinet/ip_input.c: revision 1.366

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:
source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network
And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.
 1.298.6.1 09-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1526):
sys/netinet/ip_input.c: revision 1.366

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:
source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network
And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.
 1.298.2.1 09-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1526):
sys/netinet/ip_input.c: revision 1.366

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:
source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network
And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.
 1.302.2.4 03-Dec-2017  jdolecek update from HEAD
 1.302.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.302.2.2 23-Jun-2013  tls resync from head
 1.302.2.1 25-Feb-2013  tls resync with head
 1.307.2.3 18-May-2014  rmind sync with head
 1.307.2.2 28-Aug-2013  rmind sync with head
 1.307.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.310.2.1 10-Aug-2014  tls Rebase.
 1.319.10.2 17-Sep-2019  martin Pull up following revision(s) (requested by bouyer in ticket #1708):

sys/netinet6/ip6_input.c: revision 1.209 via patch
sys/netinet/ip_input.c: revision 1.390 via patch

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.319.10.1 09-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1563):
sys/netinet/ip_input.c: revision 1.366 (via patch)

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:
source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network
And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.
 1.319.6.2 17-Sep-2019  martin Pull up following revision(s) (requested by bouyer in ticket #1708):

sys/netinet6/ip6_input.c: revision 1.209 via patch
sys/netinet/ip_input.c: revision 1.390 via patch

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.319.6.1 09-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1563):
sys/netinet/ip_input.c: revision 1.366 (via patch)

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:
source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network
And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.
 1.319.4.10 28-Aug-2017  skrll Sync with HEAD
 1.319.4.9 05-Feb-2017  skrll Sync with HEAD
 1.319.4.8 05-Dec-2016  skrll Sync with HEAD
 1.319.4.7 05-Oct-2016  skrll Sync with HEAD
 1.319.4.6 09-Jul-2016  skrll Sync with HEAD
 1.319.4.5 19-Mar-2016  skrll Sync with HEAD
 1.319.4.4 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.319.4.3 22-Sep-2015  skrll Sync with HEAD
 1.319.4.2 06-Jun-2015  skrll Sync with HEAD
 1.319.4.1 06-Apr-2015  skrll Sync with HEAD
 1.319.2.2 17-Sep-2019  martin Pull up following revision(s) (requested by bouyer in ticket #1708):

sys/netinet6/ip6_input.c: revision 1.209 via patch
sys/netinet/ip_input.c: revision 1.390 via patch

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.319.2.1 09-Feb-2018  martin Pull up following revision(s) (requested by maxv in ticket #1563):
sys/netinet/ip_input.c: revision 1.366 (via patch)

Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.

By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.

It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:
source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network
And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.
 1.337.2.5 26-Apr-2017  pgoyette Sync with HEAD
 1.337.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.337.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.337.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.337.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.347.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.355.2.9 07-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1661):

sys/netinet6/ip6_id.c: revision 1.19-1.21
sys/netinet6/ip6_var.h: revision 1.88
sys/netinet/ip_input.c: revision 1.400
sys/netinet/tcp_subr.c: revision 1.285
sys/netinet/ip6.h: revision 1.30

netinet: Enable random IP fragment ids by default (from riastradh)

netinet: Enable RFC 1948 pseudorandom TCP ISS selection by default.
(from riastradh)

netinet6: Mark randomid unused.

Will make merging and bisection easier if anything goes wrong with
flow label or fragment id randomization changes.
(from riastradh)

netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)

Replace randomid() by cprng_fast32()
 1.355.2.8 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.355.2.7 17-Sep-2019  martin Pull up following revision(s) (requested by bouyer in ticket #1378):

sys/netinet6/ip6_input.c: revision 1.209 (patch)
sys/netinet/ip_input.c: revision 1.390 (patch)

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.355.2.6 18-Mar-2018  martin Pull up following revision(s) (requested by tih in ticket #639):
sys/kern/uipc_socket.c: revision 1.258
sys/kern/uipc_socket.c: revision 1.259
sys/netinet/ip_input.c: revision 1.364 (via patch)
sys/netinet/ip_output.c: revision 1.289
sys/netinet/in.h: revision 1.102
sys/netinet/in_pcb.c: revision 1.181
share/man/man9/sockopt.9: revision 1.11
sys/netinet/in_pcb.h: revision 1.65
sys/sys/socketvar.h: revision 1.146
sys/kern/uipc_syscalls.c: revision 1.189
sys/netinet/ip_output.c: revision 1.290
share/man/man4/ip.4: revision 1.41
share/man/man4/ip.4: revision 1.42
sys/kern/uipc_syscalls.c: revision 1.190

pass valsize for getsockopt like we do for setsockopt
make sure that we have enough space, don't require the exact size
(Tom Ivar Helbekkmo)

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo

new sentence-new line

Remove comment now that the getsockopt code passes the size.

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).
(Tom Ivar Helbekkmo)
 1.355.2.5 26-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #588):
sys/netinet6/in6.c: revision 1.260
sys/netinet/in.c: revision 1.219
sys/netinet/wqinput.c: revision 1.4
sys/rump/net/lib/libnetinet/netinet_component.c: revision 1.11
sys/netinet/ip_input.c: revision 1.376
sys/netinet6/ip6_input.c: revision 1.193
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043
 1.355.2.4 12-Feb-2018  snj Pull up following revision(s) (requested by maxv in ticket #547):
sys/netinet/ip_input.c: 1.366
Disable ip_allowsrcrt and ip_forwsrcrt. Enabling them by default was a
completely dumb idea, because they have security implications.
By sending an IPv4 packet containing an LSRR option, an attacker will
cause the system to forward the packet to another IPv4 address - and
this way he white-washes the source of the packet.
It is also possible for an attacker to reach hidden networks: if a server
has a public address, and a private one on an internal network (network
which has several internal machines connected), the attacker can send a
packet with:
source = 0.0.0.0
destination = public address of the server
LSRR first address = address of a machine on the internal network
And the packet will be forwarded, by the server, to the internal machine,
in some cases even with the internal IP address of the server as a source.
 1.355.2.3 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.355.2.2 10-Dec-2017  snj Pull up following revision(s) (requested by roy in ticket #390):
sys/netinet/ip_input.c: 1.363
sys/netinet6/ip6_input.c: 1.184-1.185
sys/netinet6/ip6_output.c: 1.194-1.195
sys/netinet6/in6_src.c: 1.83-1.84
Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.
--
Attempt to restore v6 networking. Not 100% certain that these
changes are all that is needed, but they're certainly a big part of it
(especially the ip6_input.c change.)
--
Treat unvalidated addresses as deprecated in rule 3.
 1.355.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.376.2.7 18-Jan-2019  pgoyette Synch with HEAD
 1.376.2.6 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.376.2.5 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.376.2.4 28-Jul-2018  pgoyette Sync with HEAD
 1.376.2.3 21-May-2018  pgoyette Sync with HEAD
 1.376.2.2 02-May-2018  pgoyette Synch with HEAD
 1.376.2.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.384.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.384.2.1 10-Jun-2019  christos Sync with HEAD
 1.389.2.3 07-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1226):

sys/netinet6/ip6_id.c: revision 1.19-1.21
sys/netinet6/ip6_var.h: revision 1.88
sys/netinet/ip_input.c: revision 1.400
sys/netinet/tcp_subr.c: revision 1.285
sys/netinet/ip6.h: revision 1.30

netinet: Enable random IP fragment ids by default (from riastradh)

netinet: Enable RFC 1948 pseudorandom TCP ISS selection by default.
(from riastradh)

netinet6: Mark randomid unused.

Will make merging and bisection easier if anything goes wrong with
flow label or fragment id randomization changes.
(from riastradh)

netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)

Replace randomid() by cprng_fast32()
 1.389.2.2 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.389.2.1 17-Sep-2019  martin Pull up following revision(s) (requested by bouyer in ticket #208):

sys/netinet6/ip6_input.c: revision 1.209
sys/netinet/ip_input.c: revision 1.390

Packet filters can return an mbuf chain with fragmented headers, so
m_pullup() it if needed and remove the KASSERT()s.
 1.397.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.402.4.2 29-Jul-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1140):

sys/netinet/ip_output.c: revision 1.330
sys/netinet/sctp_output.c: revision 1.39
sys/netinet/ip_mroute.c: revision 1.166
sys/netipsec/ipsecif.c: revision 1.24
sys/netipsec/xform_ipip.c: revision 1.80
sys/netinet/ip_output.c: revision 1.327
sys/netinet/ip_output.c: revision 1.328
sys/netinet/ip_input.c: revision 1.406
sys/netinet/ip_output.c: revision 1.329
sys/netinet/in_var.h: revision 1.105

in: get rid of unused argument from ip_newid() and ip_newid_range()

in: take a reference of ifp on IP_ROUTETOIF
The ifp could be released after ia4_release(ia).

in: narrow the scope of ifa in ip_output (NFC)

sctp: follow the recent change of ip_newid()

in: avoid racy ifa_acquire(rt->rt_ifa) in ip_output()
If a rtentry is being destroyed asynchronously, ifa referenced by rt_ifa
can be destructed and taking ifa_acquire(rt->rt_ifa) aborts with a
KASSERT failure. Fortunately, the ifa is not actually freed because of
a reference by rt_ifa, it can be available (except some functions like
psref) so as long the rtentry is held.
PR kern/59527

in: avoid racy ia4_acquire(ifatoia(rt->rt_ifa) in ip_rtaddr()
Same as the case of ip_output(), it's racy and should be avoided.
PR kern/59527
 1.402.4.1 14-Jul-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1137):

sys/netinet/ip_input.c: revision 1.405

in: avoid packet looping on incoming packets destining to an initializing
address

The initialization of an IPv4 address is done by adding a connected route and
a local route (if necessary), and then publishing itself by adding it to the
global list (and the global hashtable). Thus, there can exist a route with an
address that is not published. This inconsistent state allows an incoming
packet destining to one of a host address which is not published but has a
local route to be forwarded and routed to a loopback interface. This results
in forwarding the packet back to ip_input, that is, packet looping.

To avoid the situation, prohibit packets being forwarded via a local route.

This is a workaround for "IPv4 address initialization atomicity" in
doc/TODO.smpnet.
 1.403.2.1 02-Aug-2025  perseant Sync with HEAD
 1.17 04-Mar-2002  sommerfeld The "gif*" tunnelling interface does everything ipip does.
Move usage example from ipip.4 to gif.4
Excise ipip and stitch up the scars.
 1.16 13-Nov-2001  lukem add RCSIDs
 1.15 13-Apr-2001  thorpej branches: 1.15.2;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.14 17-Jan-2001  thorpej branches: 1.14.2;
Fix a rather annoying problem where the sockaddr_dl which holds
the link level name for the interface (ifp->if_sadl) is allocated
before ifp->if_addrlen is initialized, which could lead to allocating
too little space for the link level address.

Do this by splitting allocation of the link level name out of
if_attach() and into if_alloc_sadl(), which is normally called
by functions like ether_ifattach(). Network interfaces which
don't have a link-specific attach routine must call if_alloc_sadl()
themselves (example: gif).

Link level names are freed by if_free_sadl(), which can be called
from e.g. ether_ifdetach(). Drivers never need call if_free_sadl()
themselves as if_detach() will do it if it is not already done.

While here, add the ability to pass an AF_LINK address to
SIOCSIFADDR in ether_ioctl() (this is what caused me to notice
the problem that the above fixes).
 1.13 18-Dec-2000  thorpej Pull in BPF includes.
 1.12 18-Dec-2000  thorpej Fill in if_dlt.
 1.11 12-Dec-2000  thorpej Adapt to bpfattach() changes, and further centralize the bpfattach()
and bpfdetach() calls into link-type subroutines where possible.
 1.10 19-Apr-2000  itojun introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.9 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.8 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.7 26-Aug-1999  itojun branches: 1.7.2; 1.7.8;
clear m->m_pkthdr.rcvif before calling ip_output().
the member is used to pass struct socket to ip{,6}_output for ipsec decisions.

(i agree it is kind of ugly. we need to modify struct mbuf if we are
to do better - which seems to me a bit too much)
 1.6 04-Apr-1999  tron - Make sure that interface can't be marked up before a route to the remote
tunnel end was found.
- Flush route and reset MTU if interface is marked down.
 1.5 04-Apr-1999  tron Avoid kernel panic if interface is configured before a route to the
remote of the tunnel can be found.

XXX If you manually mark the interface as "UP" and set the MTU later
XXX sending a packet will still cause a kernel panic.
 1.4 02-Apr-1999  hwr Setting of source and destination IP is not done by
passing SIOCSIFADDR/SIOIFDSTADDR, but by passing the addresses in
the appropriate structs.
One of the mysteries of ifconfig IMHO...

Should fix kern/6899.
 1.3 02-Feb-1999  thorpej branches: 1.3.2;
Set the tunnel destination address correctly. Should fix PR #6899.
 1.2 13-Jan-1999  thorpej Use the count supplied to the pseudo-device attach routine to dynamically
allocate (once) the ipip_softc array; don't assume NIPIP contains the count.
 1.1 11-Jan-1999  thorpej Separate out the IP-in-IP implementation from the GRE code. This cleans
up the interface to ip_mroute.c somewhat, and properly separates IP-IP
from GRE. (They are similar, but they are different protocols, and should
not be implemented in the same place.)
 1.3.2.1 04-Apr-1999  tron branches: 1.3.2.1.2; 1.3.2.1.4;
Pull up revisions 1.4, 1.5 and 1.6 from trunk.
 1.3.2.1.4.1 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.3.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.7.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.7.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.7.2.4 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.7.2.3 05-Jan-2001  bouyer Sync with HEAD
 1.7.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.7.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.14.2.3 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.14.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.14.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.15.2.2 16-Mar-2002  jdolecek Catch up with -current.
 1.15.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.3 04-Mar-2002  sommerfeld The "gif*" tunnelling interface does everything ipip does.
Move usage example from ipip.4 to gif.4
Excise ipip and stitch up the scars.
 1.2 19-Apr-2000  itojun branches: 1.2.6; 1.2.8;
introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.1 11-Jan-1999  thorpej branches: 1.1.8;
Separate out the IP-in-IP implementation from the GRE code. This cleans
up the interface to ip_mroute.c somewhat, and properly separates IP-IP
from GRE. (They are similar, but they are different protocols, and should
not be implemented in the same place.)
 1.1.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.2.8.1 16-Mar-2002  jdolecek Catch up with -current.
 1.2.6.1 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.5 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.4 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.3 28-Mar-2004  martti branches: 1.3.2;
Upgraded IPFilter to 4.1.1
 1.2 01-Apr-2002  jdolecek branches: 1.2.10;
add __KERNEL_RCSID()
 1.1 24-Jan-2002  martti branches: 1.1.1;
Initial revision
 1.1.1.3 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.2 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.1 24-Jan-2002  martti branches: 1.1.1.1.2; 1.1.1.1.4; 1.1.1.1.6;
Import IPFilter 3.4.23
 1.1.1.1.6.3 17-Apr-2002  nathanw Catch up to -current.
 1.1.1.1.6.2 28-Feb-2002  nathanw Catch up to -current.
 1.1.1.1.6.1 24-Jan-2002  nathanw file ip_ipsec_pxy.c was added on branch nathanw_sa on 2002-02-28 04:15:08 +0000
 1.1.1.1.4.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.1.1.1.4.3 16-Mar-2002  jdolecek Catch up with -current.
 1.1.1.1.4.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.1.1.1.4.1 24-Jan-2002  jdolecek file ip_ipsec_pxy.c was added on branch kqueue on 2002-02-11 20:10:35 +0000
 1.1.1.1.2.4 27-Nov-2002  itojun sys/netinet/ip_h323_pxy.c via patch
sys/netinet/ip_ipsec_pxy.c via patch
sys/netinet/ip_netbios_pxy.c via patch

Fix compilation on a.out systems.

(Thorsten Frueauf)
 1.1.1.1.2.3 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.1.1.1.2.2 09-Feb-2002  he Pull up revision 1.1 (new, requested by martti):
Updated IPFilter to 3.4.23
 1.1.1.1.2.1 24-Jan-2002  he file ip_ipsec_pxy.c was added on branch netbsd-1-5 on 2002-02-09 16:55:22 +0000
 1.2.10.3 19-Oct-2004  skrll Sync with HEAD
 1.2.10.2 05-Aug-2004  skrll Fix merge mistakes.
 1.2.10.1 03-Aug-2004  skrll Sync with HEAD
 1.3.2.1 13-Aug-2004  jmc branches: 1.3.2.1.2;
Pullup rev 1.4 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.3.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.5 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.2 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.2 23-Jul-2004  martti branches: 1.1.1.2.2;
Import IPFilter 4.1.3
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.2;
Import IPFilter 4.1.1
 1.1.1.2.2.6 19-Oct-2004  skrll Sync with HEAD
 1.1.1.2.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.1.2.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.1.1.2.2.3 05-Aug-2004  skrll Fix merge mistakes.
 1.1.1.2.2.2 03-Aug-2004  skrll Sync with HEAD
 1.1.1.2.2.1 23-Jul-2004  skrll file ip_irc_pxy.c was added on branch ktrace-lwp on 2004-08-03 10:54:40 +0000
 1.1.1.1.2.1 13-Aug-2004  jmc branches: 1.1.1.1.2.1.2;
Pullup rev 1.1.1.2 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.1.1.1.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.2 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.26 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.25 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.24 28-Mar-2004  martti branches: 1.24.2;
Upgraded IPFilter to 4.1.1
 1.23 25-Sep-2002  martti branches: 1.23.6;
Fix ipmon problems on 64-bit platforms (PR#17403 and PR#17404).
 1.22 19-Sep-2002  martti Resync with official IPF
 1.21 01-Jul-2002  christos Fix iplog problem on sparc64 [from Tomi Nylund]
1. size_t is 64 bits, so use a u_32_t for iplused
2. microtime() and friends expect a struct timeval,
passing the first of two unsigned longs will not cut it.
 1.20 09-Jun-2002  itojun whitespace
 1.19 02-May-2002  martti branches: 1.19.2; 1.19.4;
Upgraded IPFilter to 3.4.27
 1.18 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.17 24-Jan-2002  martti Re-sync with IPFilter
 1.16 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.15 13-Nov-2001  lukem add RCSIDs
 1.14 28-Sep-2001  chs don't depend on other headers to include sys/proc.h for us.
 1.13 26-Mar-2001  mike branches: 1.13.2; 1.13.4;
Resolve conflicts.
 1.12 05-Feb-2001  chs branches: 1.12.2;
expose the definitions of MIN() and MAX() in sys/param.h to the kernel
and use those in favor of a dozen copies scattered around the source tree.
 1.11 09-Aug-2000  veego Resolve conflicts.
 1.10 03-May-2000  veego branches: 1.10.4;
Resolve conflicts.
 1.9 30-Mar-2000  augustss Remove register declarations.
 1.8 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.7 10-Dec-1998  christos branches: 1.7.2; 1.7.8; 1.7.14;
defopt IPFILTER_LOG
 1.6 22-Nov-1998  mrg merge ipf 3.2.10
 1.5 12-Jul-1998  veego Resolve conflicts from the import.
 1.4 17-May-1998  veego Resolve conflicts
 1.3 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.2 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.1 21-Sep-1997  veego branches: 1.1.1;
Initial revision
 1.1.1.15 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.14 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.13 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.12 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.11 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.10 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.9 09-Aug-2000  veego Import IP Filter 3.4.9
 1.1.1.8 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.7 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.6 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.5 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.4 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.3 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.2 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.1 21-Sep-1997  veego branches: 1.1.1.1.2; 1.1.1.1.4;
Import ip-filter 3.2beta5
 1.1.1.1.4.4 24-Nov-1998  cgd pull up rev(s) 1.6 from trunk (ipfilter 3.2.10). (mrg)
 1.1.1.1.4.3 22-Jul-1998  mellon Pull up 1.5 (veego)
 1.1.1.1.4.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.1.1.1.4.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.1.1.1.2.2 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.1.1.1.2.1 21-Sep-1997  thorpej file ip_log.c was added on branch marc-pcmcia on 1997-09-22 06:34:10 +0000
 1.7.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.7.8.3 27-Mar-2001  bouyer Sync with HEAD.
 1.7.8.2 11-Feb-2001  bouyer Sync with HEAD.
 1.7.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.7.2.1 20-Dec-1999  he Pull up revision 1.8 (requested by darrenr):
Update IPF to version 3.3.5.
 1.10.4.3 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.10.4.2 09-Feb-2002  he Pull up revisions 1.12-1.17 (requested by martti):
Updated IPFilter to 3.4.23.
 1.10.4.1 31-Aug-2000  veego Pull up ipf 3.4.9 (requested by veego). approved by releng-1-5.

basesrc/dist/ipf/HISTORY 1.8 -> 1.9
basesrc/dist/ipf/fils.c 1.9 -> 1.10
basesrc/dist/ipf/ip_sfil.c 1.5 -> 1.6
basesrc/dist/ipf/ipf.c 1.4 -> 1.5
basesrc/dist/ipf/ipmon.c 1.4 -> 1.5
basesrc/dist/ipf/ipnat.c 1.5 -> 1.6
basesrc/dist/ipf/natparse.c 1.3 -> 1.4
basesrc/dist/ipf/parse.c 1.4 -> 1.5
basesrc/dist/ipf/iplang/iplang_y.y 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.1 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.5 1.1 -> 1.2
syssrc/sys/netinet/fil.c 1.36 -> 1.37
syssrc/sys/netinet/ip_auth.c 1.17 -> 1.18
syssrc/sys/netinet/ip_fil.c 1.57 -> 1.58
syssrc/sys/netinet/ip_ftp_pxy.c 1.16 -> 1.17
syssrc/sys/netinet/ip_log.c 1.10 -> 1.11
syssrc/sys/netinet/ip_nat.c 1.34 -> 1.35
syssrc/sys/netinet/ip_nat.h 1.20 -> 1.21
syssrc/sys/netinet/ip_rcmd_pxy.c 1.4 -> 1.5
syssrc/sys/netinet/ip_state.c 1.26 -> 1.27
syssrc/sys/netinet/ip_state.h 1.16 -> 1.17
syssrc/sys/netinet/ipl.h 1.8 -> 1.9

Changes:
>3.4.9 08/08/2000 - Released
>
>implement new aging mechanism in fr_tcp_age()
>
>fix icmp state checking bug
>
>revamp buildsunos script and build both sparcv7/sparcv9 for Solaris
>if on an Ultra with a 64bit system & compiler (Caseper Dik)
>
>open ipfilter device read only if we know we can
>
>print out better information for ICMP packets in ipmon
>
>move checking for source spoofed packets to a point where we can generate
>logs of them
>
>return EFAULT from ircopyptr/iwcopyptr
>
>don't do ioctl(SIOCGETFS) for auth stats
>
>fix up freeing mbufs for post-4.3BSD
>
>fix returning of inc from ftp proxy
>
>fix bugs with ipfs -R/-W (Caseper Dik)
>
>3.4.8 19/07/2000 - Released
>
>create fake opt_inet6.h for FreeBSD-4 compile as LKM
>
>add #ifdef's for KLD_MODULE sanity
>
>NAT fastroute'd packets which come out of return-*
>
>fix upper/lower case crap in ftp proxy and get seq# checking fixed up.
>
>3.4.7 08/07/2000 - Released
>
>make "ipf -y" lookup NAT if's which are unknown
>
>prepend line numbers to ioctl error messages in ipf/ipnat
>
>don't apply patches to FreeBSD twice
>
>allow for ip_len to be on an unaligned boundary early on in fr_precheck
>
>fix printing of icmp code when it is 0
>
>correct printing of port numbers in map rules with from/to
>
>don't allow fr_func to be called at securelevel > 0 or rules to be added
>if securelevel > 0 if they have a non-zero fr_func.
 1.12.2.10 18-Oct-2002  nathanw Catch up to -current.
 1.12.2.9 20-Sep-2002  thorpej Sync with HEAD.
 1.12.2.8 01-Aug-2002  nathanw Catch up to -current.
 1.12.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.12.2.6 04-May-2002  thorpej Update from trunk.
 1.12.2.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.12.2.4 28-Feb-2002  nathanw Catch up to -current.
 1.12.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.12.2.2 08-Oct-2001  nathanw Catch up to -current.
 1.12.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.13.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.13.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.13.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.13.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.13.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.13.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.19.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.19.2.2 15-Jul-2002  gehenna catch up with -current.
 1.19.2.1 20-Jun-2002  gehenna catch up with -current.
 1.23.6.2 19-Oct-2004  skrll Sync with HEAD
 1.23.6.1 03-Aug-2004  skrll Sync with HEAD
 1.24.2.1 13-Aug-2004  jmc branches: 1.24.2.1.2;
Pullup rev 1.25 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.24.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.26 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.4 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.3 23-Jul-2004  martti branches: 1.3.2;
Upgraded IPFilter to 4.1.3
 1.2 01-Apr-2004  martin A few more ioctl vs. copyin changes, spotted by Bill Studenmund.
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.2 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.2;
Import IPFilter 4.1.1
 1.1.1.1.2.2 13-Aug-2004  jmc branches: 1.1.1.1.2.2.2;
Pullup rev 1.3 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.1.1.1.2.1 02-Apr-2004  tron Pull up revision 1.2 (requested by martin in ticket #46):
A few more ioctl vs. copyin changes, spotted by Bill Studenmund.
 1.1.1.1.2.2.2.1 06-Feb-2005  jmc Pull up revision 1.4 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.3.2.6 19-Oct-2004  skrll Sync with HEAD
 1.3.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.3.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.3.2.3 05-Aug-2004  skrll Fix merge mistakes.
 1.3.2.2 03-Aug-2004  skrll Sync with HEAD
 1.3.2.1 23-Jul-2004  skrll file ip_lookup.c was added on branch ktrace-lwp on 2004-08-03 10:54:40 +0000
 1.2 05-Oct-2004  yamt move netinet/ip_lookup.h -> dist/ipf/netinet/ip_lookup.h.
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.4; 1.1.1.1.6;
Import IPFilter 4.1.1
 1.1.1.1.6.1 06-Feb-2005  jmc Pull up revision 1.2 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.1.1.1.4.5 19-Oct-2004  skrll Sync with HEAD
 1.1.1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.1.1.4.1 28-Mar-2004  skrll file ip_lookup.h was added on branch ktrace-lwp on 2004-08-03 10:54:40 +0000
 1.166 11-Jun-2025  ozaki-r in: get rid of unused argument from ip_newid() and ip_newid_range()
 1.165 15-Mar-2022  andvar branches: 1.165.4; 1.165.10;
s/heaader/header/
 1.164 12-Nov-2020  kardel PR kern/55779:

restore non-desctructive guarantee of ip_mforward() mbuf
argument. This avoids generation invalid UDP checksums
on multicast packets in ip_output().

XXX the root cause of the misguided fix in 2008 should be
XXX investigated
 1.163 14-Sep-2018  maxv branches: 1.163.4; 1.163.12;
Use non-variadic function pointer in protosw::pr_input.
 1.162 11-Jul-2018  martin Add missing <netinet/in_offload.h> include.
 1.161 11-Jul-2018  maxv Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.
 1.160 21-Jun-2018  knakahara branches: 1.160.2;
sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.
However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.159 14-May-2018  maxv Don't crash if there is no inner IP header.
 1.158 07-May-2018  maxv Fix possible buffer overflow. We need to make sure the inner IPv4 packet
doesn't have options, because we validate only an option-less header.
 1.157 11-Apr-2018  maxv Add XXX.
 1.156 11-Apr-2018  maxv Add XXX.
 1.155 21-Mar-2018  roy Sprinkle more soroverflow().
 1.154 09-Feb-2018  maxv branches: 1.154.2;
Style (realign everything correctly), and fix a typo.
 1.153 07-Feb-2018  maxv Style and constify.
 1.152 07-Feb-2018  maxv More style. No functional change.
 1.151 07-Feb-2018  maxv Remove parentheses in return statements. No functional change.
 1.150 07-Feb-2018  maxv Style and remove unused macros. More to come.
 1.149 07-Feb-2018  maxv Remove RSVP_ISI, that's mostly dead code. FreeBSD and OpenBSD too removed
it; FreeBSD kept some pieces but they are mostly no-opts.

Sent on tech-net@, no comment.
 1.148 15-Nov-2017  knakahara Add argument to encapsw->pr_input() instead of m_tag.
 1.147 23-Jul-2017  para kmem_intr_free kmem_intr_[z]alloced memory

the underlying pools are the same but api-wise those should match
 1.146 24-Jan-2017  ozaki-r branches: 1.146.6;
Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.145 11-Jan-2017  ozaki-r branches: 1.145.2;
Get rid of unnecessary header inclusions
 1.144 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.143 04-Jul-2016  knakahara branches: 1.143.2;
make encap_lock_{enter,exit} interruptable.
 1.142 04-Jul-2016  knakahara let gif(4) promise softint(9) contract (2/2) : ip_encap side

The last commit does not care encaptab. This commit fixes encaptab race which
is used not only gif(4).
 1.141 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.140 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.139 26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.138 26-Jan-2016  knakahara eliminate variable argument in encapsw
 1.137 26-Jan-2016  knakahara implement encapsw instead of protosw and uniform prototype.

suggested and advised by riastradh@n.o, thanks.

BTW, It seems in_stf_input() had bugs...
 1.136 22-Jan-2016  riastradh Back out previous change to introduce struct encapsw.

This change was intended, but Nakahara-san had already made a better
one locally! So I'll let him commit that one, and I'll try not to
step on anyone's toes again.
 1.135 22-Jan-2016  riastradh Don't abuse struct protosw for ip_encap -- introduce struct encapsw.

Mostly mechanical change to replace it, culling some now-needless
boilerplate around all the users.

This does not substantively change the ip_encap API or eliminate
abuse of sketchy pointer casts -- that will come later, and will be
easier now that it is not tangled up with struct protosw.
 1.134 20-Jan-2016  riastradh Eliminate struct protosw::pr_output.

You can't use this unless you know what it is a priori: the formal
prototype is variadic, and the different instances (e.g., ip_output,
route_output) have different real prototypes.

Convert the only user of it, raw_send in net/raw_cb.c, to take an
explicit callback argument. Convert the only instances of it,
route_output and key_output, to such explicit callbacks for raw_send.
Use assertions to make sure the conversion to explicit callbacks is
warranted.

Discussed on tech-net with no objections:
https://mail-index.netbsd.org/tech-net/2016/01/16/msg005484.html
 1.133 17-Jan-2016  christos PR/50670: David Binderman: Tidy up debugging printfs to avoid if else confusion.
 1.132 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.131 18-Oct-2014  snj branches: 1.131.2;
src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.130 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.129 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.128 14-Sep-2013  martin branches: 1.128.2;
Remove unused variable
 1.127 05-Jun-2013  christos branches: 1.127.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.126 24-Sep-2012  msaitoh Add missing "\n" in log(9)
 1.125 01-May-2012  martin branches: 1.125.2;
Explicitly include <sys/kmem.h>
 1.124 30-Apr-2012  rmind - Replace some malloc(9) uses with kmem(9).
- G/C M_IPMOPTS, M_IPMADDR and M_BWMETER.
 1.123 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.122 19-Dec-2011  drochner branches: 1.122.2;
rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.121 19-Oct-2011  dyoung branches: 1.121.2; 1.121.6;
Use if_addr_init() and if_mcast_op() instead of ifp->if_ioctl().
 1.120 31-Aug-2011  plunky NULL does not need a cast
 1.119 17-Jul-2011  joerg Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.118 18-Mar-2009  cegger bzero -> memset
 1.117 19-Dec-2008  cegger branches: 1.117.2;
use M_ZERO on malloc() and remove subsequent bzero().
 1.116 01-Oct-2008  rmind branches: 1.116.2;
PR/39664: Dave Huang: ip_mrouter_done: free hash using hashdone(9).
 1.115 06-Aug-2008  plunky Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.114 22-May-2008  dyoung branches: 1.114.4;
Don't cast to void * unnecessarily.
 1.113 08-May-2008  taca Make sure to clear csum_flags before forward the packet.

This change should be fix DIAGNOSTIC kernel's panic when the machine act
as multicast router.

Advised from tls@ and approved by thorpej@.
 1.112 05-May-2008  ad branches: 1.112.2;
- Convert hashinit() to use kmem_alloc(). The hash tables can be large
and it's better to not have them in kmem_map.
- Convert a couple of minor items along the way to kmem_alloc().
- Fix some memory leaks.
 1.111 06-Feb-2008  matt branches: 1.111.6; 1.111.8; 1.111.10;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.
 1.110 21-Dec-2007  matt Add fix for ip_id information leakage. Since the leakage information is
primarily used with TCP SYN and RST packets and such packets are less than
the smallest sized packet that an IP stack is allowed to fragment, we simply
set ip_id to 0 for all packets 68 bytes or less.
 1.109 27-Nov-2007  christos branches: 1.109.2; 1.109.6;
require that the options argument is the right size, not that it is greater
or equal to the requested size. Suggested by Matt Thomas.
 1.108 02-Sep-2007  dyoung branches: 1.108.6;
m_copym(..., 0, M_COPYALL, ...) -> m_copypacket(..., ...).
 1.107 02-Sep-2007  dyoung m_copy() was deprecated, apparently, long ago. m_copy(...) ->
m_copym(..., M_DONTWAIT).
 1.106 31-Aug-2007  dyoung Fix bug in last: add missing ampersand.
 1.105 31-Aug-2007  dyoung Stop sharing a sockaddr_in template among multicast routines,
because that's just going to cause problems down the road. (Suppose
we can have two CPUs in the network stack someday?) Instead, use
sockaddr_in_init() to initialize a sockaddr_in on the stack.

Use ifreq_setaddr() to initialize ifreq.ifr_addr.
 1.104 09-Jul-2007  ad branches: 1.104.2; 1.104.6; 1.104.8;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.103 13-Jun-2007  christos PR/36484: Pavlin Radoslavov: PIM Register in-kernel encapsulation IP_DF
setting is incorrect
 1.102 25-Apr-2007  dyoung Get rid of some gratuitous casts and join some lines.
 1.101 04-Mar-2007  christos branches: 1.101.2; 1.101.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.100 16-Nov-2006  christos branches: 1.100.2; 1.100.4; 1.100.8;
__unused removal on arguments; approved by core.
 1.99 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.98 30-Aug-2006  christos branches: 1.98.2; 1.98.4;
Fix initializers.
 1.97 25-Apr-2006  liamjfoy - use MAXTTL

ok christos@
 1.96 11-Dec-2005  christos branches: 1.96.4; 1.96.6; 1.96.8; 1.96.10; 1.96.12;
merge ktrace-lwp.
 1.95 03-Aug-2005  gdt Restore to working order; this has apparently been nonworking since
the decapsulator dispatch changes in 2001. Problems found and fixed
by Christine Jones of BBN. Specifically:

Check for a packet's protocol to be ENCAP_PROTO, not AF_INET.

Remove one-back cache for last vif, because vif_encapcheck is called
for each vif, rather than being expected to find the appropriate vif.
The cache usage caused packets to be input on the wrong vif and hence
usually dropped.

In vif_encapcheck, verify the local source as well. While mrouted
endeavors not to create multiple tunnels with a peer, a packet
arriving with the wrong local address is still wrong and should not be
accepted. (This is a correctness nit, not a security issue.) Order
checks to fail quickly for packets being checked to see if they match
a vif other than the one they belong on (essentially, check peer
source address in outer header first).

Claim 69 bits of match (32 each from outer src/dst and 5 from checking
that inner dst is within 224/5). This should result in the vif having
a higher priority for multicast packets compared to a parallel gif(4)
tunnel, and that both seems appropriate if both are configured and
seems to match the semantics expected by the decapsulator dispatch
machinery.

(These changes were made in 2.99.15 and about a dozen nodes are
running them with many vifs. ip_mroute.c has not changed
significantly since then (February 2005) and the changes applied
cleanly to current and compile cleanly.)
 1.94 06-Jun-2005  martin branches: 1.94.2;
Since we decided "const struct mbuf *" would not do the right thing (tm),
remove ~all const from mbuf pointers.
 1.93 06-Jun-2005  martin Adapt to ip_encap.h constification.
 1.92 06-Jun-2005  christos make this compile again.
 1.91 29-May-2005  christos change casts to proper unconst. mark XXXUNCONST
 1.90 26-Feb-2005  perry nuke trailing whitespace
 1.89 03-Feb-2005  perry ANSIify function declarations
 1.88 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.87 15-Jan-2005  manu branches: 1.87.2; 1.87.4;
Duplicate nested if statement in PIM code (from the OpenBSD tree)
 1.86 04-Sep-2004  manu IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
 1.85 26-Apr-2004  matt Remove #else clause of __STDC__
 1.84 22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.83 21-Apr-2004  itojun kill sprintf, use snprintf
 1.82 19-Nov-2003  jonathan Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.
 1.81 17-Nov-2003  jonathan Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
 1.80 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.79 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.78 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.77 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.76 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.75 30-Jun-2003  itojun branches: 1.75.2;
better ip_mrouter_detach(). by ono@kame
 1.74 26-Jun-2003  itojun fix stats to meet 4.4BSD practice
 1.73 26-Jun-2003  itojun expire mrt if mrt_stall goes empty. ono@kame
 1.72 26-Jun-2003  itojun cleanup multicast routing stuff on if_detach().
XXX sideeffect to running instance of multicast routing daemon unknown
 1.71 14-May-2003  itojun more KNF
 1.70 14-May-2003  itojun more KNF
 1.69 14-May-2003  itojun wrap multiline macro by do {} while (0)
 1.68 14-May-2003  itojun constcond
 1.67 14-May-2003  itojun KNF
 1.66 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.65 19-Jan-2003  simonb Remove variables that are only assigned too but not referenced.
 1.64 05-Nov-2002  fair Add required IPSEC #include files that prevented this from compiling.
 1.63 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.62 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.61 31-Jul-2002  itojun bring back old copyright notice lost in rev 1.15 (which is the authors' intent).
 1.60 09-Jun-2002  itojun whitespace
 1.59 04-Mar-2002  sommerfeld branches: 1.59.6; 1.59.8;
The "gif*" tunnelling interface does everything ipip does.
Move usage example from ipip.4 to gif.4
Excise ipip and stitch up the scars.
 1.58 13-Nov-2001  lukem add RCSIDs
 1.57 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.56 22-Jul-2001  wiz branches: 1.56.4;
seperate -> separate
 1.55 02-Jun-2001  thorpej branches: 1.55.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.54 08-May-2001  itojun pull encapsulated packet for vif* via ip_encap framework.
 1.53 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.52 24-Jan-2001  itojun branches: 1.52.2;
- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.51 08-Nov-2000  ad Update for hashinit() change.
 1.50 19-Apr-2000  itojun branches: 1.50.4;
introduce sys/netinet/ip_encap.c, to dispatch inbound packets
to protocol handlers, based on src/dst (for ip proto #4/41).
see comment in ip_encap.c for details of the problem we have.
there are too many protocol specs for ip proto #4/41.
backward compatibility with MROUTING case is now provided in ip_encap.c.

fix ipip to work with gif (using ip_encap.c). sorry for breakage.

gif now uses ip_encap.c.

introduce stf pseudo interface (implements 6to4, another IPv6-over-IPv4 code
with ip proto #41).
 1.49 16-Apr-2000  chs remove an LBL ifdef that we can't turn on anyway.
 1.48 30-Mar-2000  augustss Remove register declarations.
 1.47 23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.46 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.45 01-Feb-2000  thorpej Fix a couple of whitespace glitches.
 1.44 09-Jul-1999  thorpej branches: 1.44.2;
defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.43 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.42 27-Mar-1999  nathanw branches: 1.42.4; 1.42.6;
Alpha printf format fixes.
Closes PR kern/7258.
 1.41 02-Feb-1999  marc remove gre_softc declaration; the symbol is no longer used in this
file.
 1.40 01-Feb-1999  mycroft Clear mfchashtbl after it's deallocated, to kill a stray pointer. Fixes PR
5400.
 1.39 11-Jan-1999  thorpej Adjust for the new IP-IP input path. mrt_ipip_input() is called from
ipip_input(), and returns non-zero if mrt_ipip_input() handled the
packet.

XXX Eventually, the multicast code should probably use regular IP-IP
XXX `interfaces', but mrouted knows about the VIF table, etc.
 1.38 22-Dec-1998  thorpej ipip_input() -> mrt_ipip_input().
 1.37 13-Sep-1998  hwr branches: 1.37.4;
Add a gre tunnel pseudo network device. Gre = generic route encapsulation.
This device shows up like any other network interface and can be used to
tunnel L3 protocols as e.g. IP over IP.
 1.36 07-Feb-1998  chs add flags arg to hashinit(), to pass to malloc().
 1.35 14-Aug-1997  mycroft branches: 1.35.4;
Make sure we install the route returned by the upcall before trying to
forward any queued packets. From Bill Fenner, via Brad Karp.
 1.34 13-Oct-1996  christos branches: 1.34.10;
backout previous kprintf changes
 1.33 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.32 14-Sep-1996  mrg remove an unused variable.
 1.31 09-Sep-1996  mycroft Rework the token bucket filter to use a list of packets rather than a static
array. Also, fix several memory leaks. From Bill Fenner.
 1.30 09-Sep-1996  mycroft Cosmetic changes, some from Bill Fenner.
 1.29 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.28 23-Jun-1996  mycroft Return ENOPROTOOPT rather than picking pseudo-random error values.
Don't allow SIOCGET{VIF,SG}CNT from sockets other than the multicast router.
Restructure rip_ctloutput() like ip_ctloutput(), and fix memory leaks.
 1.27 07-May-1996  thorpej branches: 1.27.4;
Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.26 16-Mar-1996  christos Fix printf format args.
 1.25 13-Feb-1996  christos netinet prototypes
 1.24 12-Aug-1995  mycroft splnet --> splsoftnet
 1.23 12-Jun-1995  mycroft Clear the MFC entry's statistical counters when doing an upcall.
 1.22 04-Jun-1995  mycroft Simplify ipip_input() a bit. Don't blow away the vif cache if someone sends
us a bogus packet.
 1.21 04-Jun-1995  mycroft Simply tbf_control() a bit.
 1.20 04-Jun-1995  mycroft Eliminate compiler warnings.
 1.19 04-Jun-1995  mycroft For consistency, set sin_len for SIOC{ADD,DEL}MULTI.
 1.18 04-Jun-1995  mycroft Clean up many more casts.
 1.17 02-Jun-1995  mycroft Dynamically allocate the deencapsulation interfaces. Abstract the code to
reset a vif into a separate function.
 1.16 01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.15 31-May-1995  mycroft Integrate multicast 3.5 distribution, with several bugs fixed and general
cleanup. This is a (working) snapshot of work in progress.
 1.14 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.13 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.12 09-Jun-1994  brezak Update to version 2 mrouting; from pre-4.4lite NetBSD + 4.4 mods
 1.11 04-Jun-1994  mycroft Remove a spurious splx().
 1.10 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.9 10-Feb-1994  mycroft Format police.
 1.8 10-Feb-1994  mycroft Deprecate af.h.
 1.7 29-Jan-1994  brezak Fix some cases of NOT dealing with m_pkthdr's. This code is still suspect though, at least this fixes some panics.
 1.6 18-Jan-1994  brezak Fix some prototype detected warnings/errors.
 1.5 18-Jan-1994  brezak Patch for ip-multicast bugs from mccanne@ee.lbl.gov (Steven McCanne)
 1.4 10-Jan-1994  mycroft Should compile now with or without `options MULTICAST'.
 1.3 09-Jan-1994  mycroft Prototype the rest.
 1.2 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.1 06-Dec-1993  hpeyerl branches: 1.1.1;
multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.1.1.1 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.27.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.34.10.1 23-Aug-1997  thorpej Update marc-pcmcia branch from trunk.
 1.35.4.1 03-Feb-1999  cgd pull up rev 1.40 from trunk (mycroft)
 1.37.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.42.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.42.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.42.4.2 02-Aug-1999  thorpej Update from trunk.
 1.42.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.44.2.4 21-Apr-2001  bouyer Sync with HEAD
 1.44.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.44.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.44.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.50.4.4 08-Feb-2004  msaitoh Apply patch (requested by Klaus Heinz in ticket #121):

make it compile with "options MROUTING".
 1.50.4.3 30-Nov-2003  he Pull up revisions 1.73,1.75 via patch, requested by itojun in ticket #54:
Clean up multicast routing when an interface is detached.
 1.50.4.2 04-Sep-2002  itojun pullup sys/netinet/ip_mroute.c 1.61 (itojun)

bring back old copyright notice lost in rev 1.15 (which is the authors' intent).
 1.50.4.1 06-Apr-2001  he Pull up revision 1.52 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.52.2.8 11-Nov-2002  nathanw Catch up to -current
 1.52.2.7 27-Aug-2002  nathanw Catch up to -current.
 1.52.2.6 01-Aug-2002  nathanw Catch up to -current.
 1.52.2.5 20-Jun-2002  nathanw Catch up to -current.
 1.52.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.52.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.52.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.52.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.55.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.55.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.55.2.3 16-Mar-2002  jdolecek Catch up with -current.
 1.55.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.55.2.1 03-Aug-2001  lukem update to -current
 1.56.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.59.8.4 11-Jul-2003  tron Pull up revision 1.75 (requested by itojun in ticket #1363):
better ip_mrouter_detach(). by ono@kame
 1.59.8.3 30-Jun-2003  grant Pull up revision 1.73 (requested by itojun in ticket #1345):

expire mrt if mrt_stall goes empty. ono@kame
 1.59.8.2 30-Jun-2003  grant Pull up revision 1.72 (requested by itojun in ticket #1342):

cleanup multicast routing stuff on if_detach().
 1.59.8.1 02-Aug-2002  lukem Pull up revision 1.61 (requested by itojun in ticket #595):
bring back old copyright notice lost in rev 1.15 (which is the authors' intent).
 1.59.6.2 29-Aug-2002  gehenna catch up with -current.
 1.59.6.1 20-Jun-2002  gehenna catch up with -current.
 1.75.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.75.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.75.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.75.2.4 17-Jan-2005  skrll Sync with HEAD.
 1.75.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.75.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.75.2.1 03-Aug-2004  skrll Sync with HEAD
 1.87.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.87.4.1 12-Feb-2005  yamt sync with head.
 1.87.2.1 29-Apr-2005  kent sync with -current
 1.94.2.6 11-Feb-2008  yamt sync with head.
 1.94.2.5 21-Jan-2008  yamt sync with head
 1.94.2.4 07-Dec-2007  yamt sync with head
 1.94.2.3 03-Sep-2007  yamt sync with head.
 1.94.2.2 30-Dec-2006  yamt sync with head.
 1.94.2.1 21-Jun-2006  yamt sync with head.
 1.96.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.96.10.1 11-May-2006  elad sync with head
 1.96.8.2 03-Sep-2006  yamt sync with head.
 1.96.8.1 24-May-2006  yamt sync with head.
 1.96.6.1 01-Jun-2006  kardel Sync with head.
 1.96.4.1 09-Sep-2006  rpaulo sync with head
 1.98.4.2 10-Dec-2006  yamt sync with head.
 1.98.4.1 22-Oct-2006  yamt sync with head
 1.98.2.1 18-Nov-2006  ad Sync with head.
 1.100.8.1 03-Jun-2008  skrll Sync with netbsd-4.
 1.100.4.2 07-May-2007  yamt sync with head.
 1.100.4.1 12-Mar-2007  rmind Sync with HEAD.
 1.100.2.1 17-May-2008  bouyer Pull up following revision(s) (requested by taca in ticket #1141):
sys/netinet/ip_mroute.c: revision 1.113
Make sure to clear csum_flags before forward the packet.
This change should be fix DIAGNOSTIC kernel's panic when the machine act
as multicast router.
Advised from tls@ and approved by thorpej@.
 1.101.4.1 11-Jul-2007  mjf Sync with head.
 1.101.2.4 09-Oct-2007  ad Sync with head.
 1.101.2.3 15-Jul-2007  ad Sync with head.
 1.101.2.2 01-Jul-2007  ad Adapt to callout API change.
 1.101.2.1 08-Jun-2007  ad Sync with head.
 1.104.8.3 23-Mar-2008  matt sync with HEAD
 1.104.8.2 09-Jan-2008  matt sync with HEAD
 1.104.8.1 06-Nov-2007  matt sync with HEAD
 1.104.6.2 03-Dec-2007  joerg Sync with HEAD.
 1.104.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.104.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.108.6.3 18-Feb-2008  mjf Sync with HEAD.
 1.108.6.2 27-Dec-2007  mjf Sync with HEAD.
 1.108.6.1 08-Dec-2007  mjf Sync with HEAD.
 1.109.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.109.2.1 26-Dec-2007  ad Sync with head.
 1.111.10.2 04-May-2009  yamt sync with head.
 1.111.10.1 16-May-2008  yamt sync with head.
 1.111.8.2 04-Jun-2008  yamt sync with head
 1.111.8.1 18-May-2008  yamt sync with head.
 1.111.6.4 17-Jan-2009  mjf Sync with HEAD.
 1.111.6.3 05-Oct-2008  mjf Sync with HEAD.
 1.111.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.111.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.112.2.3 10-Oct-2008  skrll Sync with HEAD.
 1.112.2.2 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.112.2.1 23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.114.4.1 19-Oct-2008  haad Sync with HEAD.
 1.116.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.116.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.117.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.121.6.3 02-Jun-2012  mrg sync to latest -current.
 1.121.6.2 05-Apr-2012  mrg sync to latest -current.
 1.121.6.1 18-Feb-2012  mrg merge to -current.
 1.121.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.121.2.3 30-Oct-2012  yamt sync with head
 1.121.2.2 23-May-2012  yamt sync with head.
 1.121.2.1 17-Apr-2012  yamt sync with head
 1.122.2.1 23-Oct-2012  riz Pull up following revision(s) (requested by msaitoh in ticket #616):
sys/netinet/if_atm.c: revision 1.33
sys/net/if_arcsubr.c: revision 1.64
sys/netinet/ip_mroute.c: revision 1.126
Add missing "\n" in log(9)
 1.125.2.4 03-Dec-2017  jdolecek update from HEAD
 1.125.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.125.2.2 23-Jun-2013  tls resync from head
 1.125.2.1 20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.127.2.2 18-May-2014  rmind sync with head
 1.127.2.1 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.128.2.1 10-Aug-2014  tls Rebase.
 1.131.2.7 28-Aug-2017  skrll Sync with HEAD
 1.131.2.6 05-Feb-2017  skrll Sync with HEAD
 1.131.2.5 05-Oct-2016  skrll Sync with HEAD
 1.131.2.4 09-Jul-2016  skrll Sync with HEAD
 1.131.2.3 29-May-2016  skrll Sync with HEAD
 1.131.2.2 19-Mar-2016  skrll Sync with HEAD
 1.131.2.1 22-Sep-2015  skrll Sync with HEAD
 1.143.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.143.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.145.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.146.6.5 07-Dec-2020  martin Pull up following revision(s) (requested by kardel in ticket #1632):

sys/netinet/ip_mroute.c: revision 1.164 (patch)

PR kern/55779:

restore non-desctructive guarantee of ip_mforward() mbuf
argument. This avoids generation invalid UDP checksums
on multicast packets in ip_output().

XXX the root cause of the misguided fix in 2008 should be
XXX investigated
 1.146.6.4 13-Jul-2018  martin Pull up following revision(s) via patch (requested by knakahara in ticket #905):

sys/netinet/ip_mroute.c: revision 1.160
sys/netinet6/in6_l2tp.c: revision 1.16
sys/net/if.h: revision 1.263
sys/netinet/in_l2tp.c: revision 1.15
sys/netinet/ip_icmp.c: revision 1.172
sys/netinet/igmp.c: revision 1.68
sys/netinet/ip_encap.c: revision 1.69
sys/netinet6/ip6_mroute.c: revision 1.129

sbappendaddr() is required any lock. Currently, softnet_lock is appropriate.

When rip_input() is called as inetsw[].pr_input, rip_iput() is always called
with holding softnet_lock, that is, in case of !defined(NET_MPSAFE) it is
acquired in ipintr(), otherwise(defined(NET_MPSAFE)) it is acquire in
PR_WRAP_INPUT macro.

However, some function calls rip_input() directly without holding softnet_lock.
That causes assertion failure in sbappendaddr().
rip6_input() and icmp6_rip6_input() are also required softnet_lock for the same
reason.
 1.146.6.3 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.146.6.2 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.146.6.1 10-Dec-2017  snj Pull up following revision(s) (requested by knakahara in ticket #419):
sys/net/if_stf.c: revision 1.103
sys/net/if_stf.h: revision 1.8
sys/netinet/in_gif.c: revision 1.89
sys/netinet/in_gif.h: revision 1.17
sys/netinet/in_l2tp.c: revision 1.4
sys/netinet/ip_encap.c: revision 1.66
sys/netinet/ip_encap.h: revision 1.23
sys/netinet/ip_mroute.c: revision 1.148
sys/netinet6/in6_gif.c: revision 1.87
sys/netinet6/in6_gif.h: revision 1.16
sys/netinet6/in6_l2tp.c: revision 1.7
sys/netipsec/xform.h: revision 1.13
sys/netipsec/xform_ipip.c: revision 1.55
Add argument to encapsw->pr_input() instead of m_tag.
 1.154.2.6 30-Sep-2018  pgoyette Ssync with HEAD
 1.154.2.5 28-Jul-2018  pgoyette Sync with HEAD
 1.154.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.154.2.3 21-May-2018  pgoyette Sync with HEAD
 1.154.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.154.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.160.2.1 10-Jun-2019  christos Sync with HEAD
 1.163.12.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.163.4.1 07-Dec-2020  martin Pull up following revision(s) (requested by kardel in ticket #1143):

sys/netinet/ip_mroute.c: revision 1.164

PR kern/55779:

restore non-desctructive guarantee of ip_mforward() mbuf
argument. This avoids generation invalid UDP checksums
on multicast packets in ip_output().

XXX the root cause of the misguided fix in 2008 should be
XXX investigated
 1.165.10.1 02-Aug-2025  perseant Sync with HEAD
 1.165.4.1 29-Jul-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1140):

sys/netinet/ip_output.c: revision 1.330
sys/netinet/sctp_output.c: revision 1.39
sys/netinet/ip_mroute.c: revision 1.166
sys/netipsec/ipsecif.c: revision 1.24
sys/netipsec/xform_ipip.c: revision 1.80
sys/netinet/ip_output.c: revision 1.327
sys/netinet/ip_output.c: revision 1.328
sys/netinet/ip_input.c: revision 1.406
sys/netinet/ip_output.c: revision 1.329
sys/netinet/in_var.h: revision 1.105

in: get rid of unused argument from ip_newid() and ip_newid_range()

in: take a reference of ifp on IP_ROUTETOIF
The ifp could be released after ia4_release(ia).

in: narrow the scope of ifa in ip_output (NFC)

sctp: follow the recent change of ip_newid()

in: avoid racy ifa_acquire(rt->rt_ifa) in ip_output()
If a rtentry is being destroyed asynchronously, ifa referenced by rt_ifa
can be destructed and taking ifa_acquire(rt->rt_ifa) aborts with a
KASSERT failure. Fortunately, the ifa is not actually freed because of
a reference by rt_ifa, it can be available (except some functions like
psref) so as long the rtentry is held.
PR kern/59527

in: avoid racy ia4_acquire(ifatoia(rt->rt_ifa) in ip_rtaddr()
Same as the case of ip_output(), it's racy and should be avoided.
PR kern/59527
 1.36 28-Jul-2025  kim Don't hide struct igmpmsg under _KERNEL

The struct is documented as a communication structure with userland.
Fixes PR kern/59561.
 1.35 03-Feb-2021  roy branches: 1.35.18; 1.35.24;
CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.34 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.33 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.32 07-Feb-2018  maxv branches: 1.32.16;
Remove RSVP_ISI, that's mostly dead code. FreeBSD and OpenBSD too removed
it; FreeBSD kept some pieces but they are mostly no-opts.

Sent on tech-net@, no comment.
 1.31 07-Aug-2008  cegger make this compile as proposed by dholland@
 1.30 06-Aug-2008  plunky Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.29 25-Dec-2007  perry branches: 1.29.6; 1.29.10; 1.29.12; 1.29.16;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.28 09-Jul-2007  ad branches: 1.28.8; 1.28.14; 1.28.16; 1.28.20;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.27 04-Mar-2007  christos branches: 1.27.2; 1.27.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.26 10-Dec-2005  elad branches: 1.26.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.25 26-Feb-2005  perry branches: 1.25.4;
nuke trailing whitespace
 1.24 04-Sep-2004  manu branches: 1.24.4; 1.24.6;
IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
 1.23 21-Apr-2004  itojun no space between function name and paren: foo (blah) -> foo(blah)
 1.22 18-Apr-2004  matt De __P()
 1.21 26-Jun-2003  itojun branches: 1.21.2;
cleanup multicast routing stuff on if_detach().
XXX sideeffect to running instance of multicast routing daemon unknown
 1.20 09-Jun-2002  itojun whitespace
 1.19 08-May-2001  itojun branches: 1.19.2; 1.19.14; 1.19.16;
pull encapsulated packet for vif* via ip_encap framework.
 1.18 23-Mar-2000  thorpej branches: 1.18.4; 1.18.6;
New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.17 20-Nov-1999  thorpej Add the `packed' attribute to structures which describe wire protocol data.
 1.16 01-Jul-1999  itojun branches: 1.16.2; 1.16.8;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.15 11-Jan-1999  thorpej branches: 1.15.4; 1.15.6;
Adjust for the new IP-IP input path. mrt_ipip_input() is called from
ipip_input(), and returns non-zero if mrt_ipip_input() handled the
packet.

XXX Eventually, the multicast code should probably use regular IP-IP
XXX `interfaces', but mrouted knows about the VIF table, etc.
 1.14 22-Dec-1998  thorpej ipip_input() -> mrt_ipip_input().
 1.13 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.12 09-Sep-1996  mycroft Rework the token bucket filter to use a list of packets rather than a static
array. Also, fix several memory leaks. From Bill Fenner.
 1.11 23-Jun-1996  mycroft Return ENOPROTOOPT rather than picking pseudo-random error values.
Don't allow SIOCGET{VIF,SG}CNT from sockets other than the multicast router.
Restructure rip_ctloutput() like ip_ctloutput(), and fix memory leaks.
 1.10 13-Feb-1996  christos branches: 1.10.4;
netinet prototypes
 1.9 31-May-1995  mycroft Integrate multicast 3.5 distribution, with several bugs fixed and general
cleanup. This is a (working) snapshot of work in progress.
 1.8 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.7 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 09-Jun-1994  brezak Update to version 2 mrouting; from pre-4.4lite NetBSD + 4.4 mods
 1.4 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.3 09-Jan-1994  mycroft Prototype the rest.
 1.2 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.1 06-Dec-1993  hpeyerl branches: 1.1.1;
multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.1 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.10.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.15.6.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.15.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.15.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.16.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.16.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.18.6.2 20-Jun-2002  nathanw Catch up to -current.
 1.18.6.1 21-Jun-2001  nathanw Catch up to -current.
 1.18.4.1 30-Nov-2003  he Pull up revision 1.21 via patch, requested by itojun in ticket #54:
Clean up multicast routing when an interface is detached.
 1.19.16.1 30-Jun-2003  grant Pull up revision 1.21 (requested by itojun in ticket #1342):

cleanup multicast routing stuff on if_detach().
 1.19.14.1 20-Jun-2002  gehenna catch up with -current.
 1.19.2.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.21.2.5 11-Dec-2005  christos Sync with head.
 1.21.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.21.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.21.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.21.2.1 03-Aug-2004  skrll Sync with HEAD
 1.24.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.24.4.1 29-Apr-2005  kent sync with -current
 1.25.4.3 21-Jan-2008  yamt sync with head
 1.25.4.2 03-Sep-2007  yamt sync with head.
 1.25.4.1 21-Jun-2006  yamt sync with head.
 1.26.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.27.4.1 11-Jul-2007  mjf Sync with head.
 1.27.2.1 01-Jul-2007  ad Adapt to callout API change.
 1.28.20.1 02-Jan-2008  bouyer Sync with HEAD
 1.28.16.1 26-Dec-2007  ad Sync with head.
 1.28.14.1 18-Feb-2008  mjf Sync with HEAD.
 1.28.8.1 09-Jan-2008  matt sync with HEAD
 1.29.16.1 19-Oct-2008  haad Sync with HEAD.
 1.29.12.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.29.10.1 04-May-2009  yamt sync with head.
 1.29.6.1 28-Sep-2008  mjf Sync with HEAD.
 1.32.16.1 03-Apr-2021  thorpej Sync with HEAD.
 1.35.24.1 02-Aug-2025  perseant Sync with HEAD
 1.35.18.1 29-Jul-2025  martin Pull up following revision(s) (requested by kim in ticket #1141):

sys/netinet/ip_mroute.h: revision 1.36

Don't hide struct igmpmsg under _KERNEL

The struct is documented as a communication structure with userland.

Fixes PR kern/59561.
 1.63 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.62 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.61 08-Jul-2004  christos Bring in flags from 4.1.2 to make things compile.
 1.60 29-Jun-2004  christos PR/25999: Jeff Rizzo: ipf: ipnat is corrupting "bimap" translations in 2.0_BETA and -current
 1.59 20-May-2004  christos PR/25646: Perry Metzger: Commit a patch that compiles awaiting feedback.
 1.58 10-May-2004  christos PR/25103: Martin Husemann: IP Filter 4.4.1 breaks some connections when NATing
patch from Darren applied.
 1.57 10-May-2004  christos PR/24969: Arto Selonen: /usr/sbin/ipfs from ipfilter 4.1.1 does not work
patch applied.
 1.56 28-Mar-2004  martti branches: 1.56.2;
Upgraded IPFilter to 4.1.1
 1.55 04-Dec-2003  christos fix unused variable warnings when LARGE_NAT is defined.
 1.54 24-Sep-2002  sommerfeld branches: 1.54.6;
Relax overly-conservative TCP option parsing used by ipnat when
hunting for an MSS option to clamp. The previous code assumed that at least
one more byte of options (such as a TCPOPT_EOL) would follow the MSS
option; now, we allow the MSS option to end on the last byte of the
TCP header.

Packets have been observed "in the wild" with a TCP header length of
'6' (24 bytes.. 20 bytes fixed header, 4 bytes options) with a 4-byte
MSS option exactly filling the 4 bytes of options payload and no
following TCPOPT_EOL.

RFC793 is quite explicit that the EOL byte:

" .. need only be used if the end of the options would not
otherwise coincide with the end of the TCP header."
 1.53 19-Sep-2002  martti Resync with official IPF
 1.52 19-Sep-2002  martti Upgraded IPFilter to 3.4.29
 1.51 09-Jun-2002  itojun whitespace
 1.50 05-Jun-2002  itojun typo/bound check fix from YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
 1.49 04-Jun-2002  itojun in mss clamping code, do not go past TCPOPT_EOL. enforce stricter
boundary checking. discussed on tech-net
 1.48 02-May-2002  martti branches: 1.48.2; 1.48.4;
Fix compilation problems
 1.47 02-May-2002  martti Upgraded IPFilter to 3.4.27
 1.46 14-Mar-2002  martin Add MSS clamping to the IP Filter NAT subsystem.

Configured by a new option "mssclamp" in NAT rules, like:

map pppoe0 192.168.1.0/24 -> 0/32 mssclamp 1452

This is based on work by Xiaodan Tang <xtang@qnx.com>.
 1.45 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.44 24-Jan-2002  martti Re-sync with IPFilter
 1.43 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.42 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.41 13-Nov-2001  lukem add RCSIDs
 1.40 20-May-2001  martin branches: 1.40.2;
Remove tests for IPN_FRAG bits.
There is no place in the source where this bit could ever be set (or I'm
to blind to find it).

This fixes PR 12671.

If someone thinks this is the wrong solution, please make sure to (a) reopen
the PR and (b) explain to me how the tested bits would ever get set. I'll
be glad to then look further for the real cause (i.e. the flags not getting
set in the case described in the PR).
 1.39 06-Apr-2001  darrenr fix fragment cache security hole
 1.38 26-Mar-2001  mike Resolve conflicts.
 1.37 05-Feb-2001  chs branches: 1.37.2;
expose the definitions of MIN() and MAX() in sys/param.h to the kernel
and use those in favor of a dozen copies scattered around the source tree.
 1.36 12-Aug-2000  veego Apply fix from IWAMOTO Toshihiro in pr#10813:
rev 1.35 of ip_nat.c checks if packets are too short.
For ICMP packets, this packet length checking double counts
the length of an IP header contained in ICMP messages.
So, unless ICMP packets are long enough (such as echo-reply),
packets are mistakingly considered too short and are dropped.
 1.35 09-Aug-2000  veego Resolve conflicts.
 1.34 12-Jun-2000  veego branches: 1.34.2;
Resolve conflicts.
 1.33 21-May-2000  veego branches: 1.33.2;
Resolve conflicts.
 1.32 11-May-2000  veego Resolve conflicts and fix a compile error in ip_ftp_pxy.c.
 1.31 03-May-2000  veego Resolve conflicts.
 1.30 16-Apr-2000  chs remove ifdefs to skip htons() on some big-endian platforms.
 1.29 30-Mar-2000  augustss Remove register declarations.
 1.28 01-Feb-2000  veego Resolve conflicts.
 1.27 28-Dec-1999  darrenr update ipfilter code to 3.3.6
 1.26 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.25 05-Mar-1999  mycroft branches: 1.25.2; 1.25.8; 1.25.14;
Minor cleanup to use LONG_SUM() and CALC_SUMD() more.
 1.24 02-Feb-1999  cjs Remove SCCS markers and make these compile in $NetBSD$ IDs.
 1.23 22-Nov-1998  mrg branches: 1.23.2;
merge ipf 3.2.10
 1.22 15-Nov-1998  drochner fix the previous: "securelevel" in kernel only
 1.21 14-Nov-1998  tls In 'highly secure' mode (securelevel >= 2), the filter lists may not be tampered with. It might be desirable to allow enabling of preset filter lists, but it seems too good a candidate for a denial-of-service attack, so we don't.
 1.20 12-Jul-1998  veego Resolve conflicts from the import.
 1.19 29-May-1998  veego Resolve conflicts from the import of IPFilter 3.2.7.
 1.18 17-May-1998  veego Resolve conflicts
 1.17 29-Mar-1998  scottr Change from IP-Filter 3.2.3: avoid infinite loop in nat_new() when
NAT'ing to a single IP address.
 1.16 25-Nov-1997  mrg fixes for memory leaks in proxying, and byte ordering problems. from darren reed.
 1.15 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.14 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.13 21-Sep-1997  veego branches: 1.13.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.12 21-Jul-1997  kleink branches: 1.12.2;
Nuke an `#ifdef sparc' conditional around ntohs() usage: this (1) is incomplete
and (2) makes no difference anyway. Also, minor KNF.
 1.11 16-Jul-1997  kleink Fix a misplaced brace which caused NAT list corruption; from Dave Huang
<khym@bga.com> in PR kern/3872.
 1.10 06-Jul-1997  thorpej Restore original RCS IDs.
 1.9 06-Jul-1997  thorpej Deal with a bogus warning from -Wuninitialized.
 1.8 05-Jul-1997  darrenr fix conflicts from import
 1.7 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.6 27-May-1997  thorpej Make this compile on 32-bit architectures again:
- Add prototypes.
- Get arguments to ioctl right (cmd is a u_long in NetBSD)
 1.5 25-May-1997  darrenr fix conflicts
 1.4 29-Mar-1997  thorpej Resolve conflicts from merge.

XXX !!! XXX !!!
I noticed a few semi-serious bugs while doing this merge, one of which
has existed for a fairly long time. Some of them are addressed in this
commit (because they caused the kernel to not compile), and are annoted
by "XXX" and "--thorpej". The other one will be addressed shortly in
a future commit, and, as far as I can tell, affects all operating systems
which IP Filter supports.
 1.3 29-Jan-1997  thorpej ioctl cmd arguments are u_long, not int. Pointed out by
Fred L. Templin <templin@nas.nasa.gov>
 1.2 05-Jan-1997  veego Add $NetBSD$ id's and restore the orginal Id's.
 1.1 05-Jan-1997  mrg branches: 1.1.1;
initial import of darren reed's ip-filter, version 3.1.2.
 1.1.1.27 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.26 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.25 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.24 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.23 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.22 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.21 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.20 09-Aug-2000  veego Import IP Filter 3.4.9
 1.1.1.19 12-Jun-2000  veego Import IP Filter 3.4.6
 1.1.1.18 21-May-2000  veego Import IP Filter 3.4.3
 1.1.1.17 11-May-2000  veego Import IP Filter 3.4.2
 1.1.1.16 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.15 01-Feb-2000  veego Import IP Filter 3.3.8
 1.1.1.14 28-Dec-1999  darrenr update DARRENR branch of netinet to 3.3.6
 1.1.1.13 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.12 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.11 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.10 29-May-1998  veego Import IP Filter 3.2.7
 1.1.1.9 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.8 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.7 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.6 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.5 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.4 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.3 25-May-1997  darrenr Import version 3.2alpha7
 1.1.1.2 27-Mar-1997  darrenr Bring in entire 3.2alpha2 source tree
 1.1.1.1 27-Mar-1997  darrenr Update to version 3.2alpha2
 1.12.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.13.2.5 24-Nov-1998  cgd pull up rev(s) 1.23 from trunk (ipfilter 3.2.10). (mrg)
 1.13.2.4 22-Jul-1998  mellon Pull up 1.20 (veego)
 1.13.2.3 25-Nov-1997  mrg pull up from trunk: fixes for memory leaks in proxying, and byte ordering problems. from darren reed.
 1.13.2.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.13.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.23.2.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.25.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.25.8.4 21-Apr-2001  bouyer Sync with HEAD
 1.25.8.3 27-Mar-2001  bouyer Sync with HEAD.
 1.25.8.2 11-Feb-2001  bouyer Sync with HEAD.
 1.25.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.25.2.3 14-Apr-2001  he Pull up revision 1.39 (requested by darrenr):
Fix bug related to fragment cache handling.
 1.25.2.2 08-Jan-2000  he Pull up revision 1.27 (requested by darrenr):
Update IPF to version 3.3.6.
 1.25.2.1 20-Dec-1999  he Pull up revision 1.26 (requested by darrenr):
Update IPF to version 3.3.5.
 1.33.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.34.2.6 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.34.2.5 26-Feb-2002  he Apply patch (requested by darrenr):
Prevent panic when packets are received in a certain order.
 1.34.2.4 09-Feb-2002  he Pull up revisions 1.37-1.38,1.40-1.44 (via patch, requested by martti):
Updated IPFilter to 3.4.23.
 1.34.2.3 14-Apr-2001  he Pull up revision 1.39 (requested by darrenr):
Fix bug related to fragment cache handling.
 1.34.2.2 31-Aug-2000  veego Pull up revision 1.36 (requested by veego). Approved by releng-1-5.

>Apply fix from IWAMOTO Toshihiro in pr#10813:
> rev 1.35 of ip_nat.c checks if packets are too short.
> For ICMP packets, this packet length checking double counts
> the length of an IP header contained in ICMP messages.
> So, unless ICMP packets are long enough (such as echo-reply),
> packets are mistakingly considered too short and are dropped.
 1.34.2.1 31-Aug-2000  veego Pull up ipf 3.4.9 (requested by veego). approved by releng-1-5.

basesrc/dist/ipf/HISTORY 1.8 -> 1.9
basesrc/dist/ipf/fils.c 1.9 -> 1.10
basesrc/dist/ipf/ip_sfil.c 1.5 -> 1.6
basesrc/dist/ipf/ipf.c 1.4 -> 1.5
basesrc/dist/ipf/ipmon.c 1.4 -> 1.5
basesrc/dist/ipf/ipnat.c 1.5 -> 1.6
basesrc/dist/ipf/natparse.c 1.3 -> 1.4
basesrc/dist/ipf/parse.c 1.4 -> 1.5
basesrc/dist/ipf/iplang/iplang_y.y 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.1 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.5 1.1 -> 1.2
syssrc/sys/netinet/fil.c 1.36 -> 1.37
syssrc/sys/netinet/ip_auth.c 1.17 -> 1.18
syssrc/sys/netinet/ip_fil.c 1.57 -> 1.58
syssrc/sys/netinet/ip_ftp_pxy.c 1.16 -> 1.17
syssrc/sys/netinet/ip_log.c 1.10 -> 1.11
syssrc/sys/netinet/ip_nat.c 1.34 -> 1.35
syssrc/sys/netinet/ip_nat.h 1.20 -> 1.21
syssrc/sys/netinet/ip_rcmd_pxy.c 1.4 -> 1.5
syssrc/sys/netinet/ip_state.c 1.26 -> 1.27
syssrc/sys/netinet/ip_state.h 1.16 -> 1.17
syssrc/sys/netinet/ipl.h 1.8 -> 1.9

Changes:
>3.4.9 08/08/2000 - Released
>
>implement new aging mechanism in fr_tcp_age()
>
>fix icmp state checking bug
>
>revamp buildsunos script and build both sparcv7/sparcv9 for Solaris
>if on an Ultra with a 64bit system & compiler (Caseper Dik)
>
>open ipfilter device read only if we know we can
>
>print out better information for ICMP packets in ipmon
>
>move checking for source spoofed packets to a point where we can generate
>logs of them
>
>return EFAULT from ircopyptr/iwcopyptr
>
>don't do ioctl(SIOCGETFS) for auth stats
>
>fix up freeing mbufs for post-4.3BSD
>
>fix returning of inc from ftp proxy
>
>fix bugs with ipfs -R/-W (Caseper Dik)
>
>3.4.8 19/07/2000 - Released
>
>create fake opt_inet6.h for FreeBSD-4 compile as LKM
>
>add #ifdef's for KLD_MODULE sanity
>
>NAT fastroute'd packets which come out of return-*
>
>fix upper/lower case crap in ftp proxy and get seq# checking fixed up.
>
>3.4.7 08/07/2000 - Released
>
>make "ipf -y" lookup NAT if's which are unknown
>
>prepend line numbers to ioctl error messages in ipf/ipnat
>
>don't apply patches to FreeBSD twice
>
>allow for ip_len to be on an unaligned boundary early on in fr_precheck
>
>fix printing of icmp code when it is 0
>
>correct printing of port numbers in map rules with from/to
>
>don't allow fr_func to be called at securelevel > 0 or rules to be added
>if securelevel > 0 if they have a non-zero fr_func.
 1.37.2.10 18-Oct-2002  nathanw Catch up to -current.
 1.37.2.9 20-Sep-2002  thorpej Sync with HEAD.
 1.37.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.37.2.7 04-May-2002  thorpej Update from trunk.
 1.37.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.37.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.37.2.4 08-Jan-2002  nathanw Catch up to -current.
 1.37.2.3 14-Nov-2001  nathanw Catch up to -current.
 1.37.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.37.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.40.2.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.40.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.40.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.40.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.48.4.6 30-Dec-2003  jmc Backout changes from #1568 as too many people are reporting problems
with "out of the box" configurations.
 1.48.4.5 26-Nov-2003  cyber Patch (requested by jklos ticket #1564):
Change to ip filter"s NAT code to keep excessive NAT entries from
causing the kernel to panic.
 1.48.4.4 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.48.4.3 13-Oct-2002  lukem Pull up revision 1.54 (requested by sommerfeld in ticket #884):
Relax overly-conservative TCP option parsing used by ipnat when
hunting for an MSS option to clamp. The previous code assumed that at least
one more byte of options (such as a TCPOPT_EOL) would follow the MSS
option; now, we allow the MSS option to end on the last byte of the
TCP header.
Packets have been observed "in the wild" with a TCP header length of
'6' (24 bytes.. 20 bytes fixed header, 4 bytes options) with a 4-byte
MSS option exactly filling the 4 bytes of options payload and no
following TCPOPT_EOL.
RFC793 is quite explicit that the EOL byte:
" .. need only be used if the end of the options would not
otherwise coincide with the end of the TCP header."
 1.48.4.2 05-Jun-2002  lukem Pull up revision 1.50 (requested by itojun in ticket #177):
typo/bound check fix from YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
 1.48.4.1 05-Jun-2002  lukem Pull up revision 1.49 (requested by itojun in ticket #154):
in mss clamping code, do not go past TCPOPT_EOL. enforce stricter
boundary checking. discussed on tech-net
 1.48.2.1 20-Jun-2002  gehenna catch up with -current.
 1.54.6.2 19-Oct-2004  skrll Sync with HEAD
 1.54.6.1 03-Aug-2004  skrll Sync with HEAD
 1.56.2.5 13-Aug-2004  jmc branches: 1.56.2.5.2;
Pullup rev 1.61-1.62 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.56.2.4 02-Jul-2004  he Pull up revision 1.60 (requested by christos in ticket #571):
Fix problem where ipnat is corrupting ``bimap'' translations.
Fixes PR#25999.
 1.56.2.3 30-May-2004  tron Pull up revision 1.59 (requested by christos in ticket #416):
PR/25646: Perry Metzger: Commit a patch that compiles awaiting feedback.
 1.56.2.2 30-May-2004  tron Pull up revision 1.58 (requested by christos in ticket #416):
PR/25103: Martin Husemann: IP Filter 4.4.1 breaks some connections when NATing
patch from Darren applied.
 1.56.2.1 30-May-2004  tron Pull up revision 1.57 (requested by christos in ticket #416):
PR/24969: Arto Selonen: /usr/sbin/ipfs from ipfilter 4.1.1 does not work
patch applied.
 1.56.2.5.2.1 06-Feb-2005  jmc Pull up revision 1.63 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.35 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.34 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.33 08-Jul-2004  christos Bring in flags from 4.1.2 to make things compile.
 1.32 10-May-2004  christos PR/24969: Arto Selonen: /usr/sbin/ipfs from ipfilter 4.1.1 does not work
patch applied.
 1.31 28-Mar-2004  martti branches: 1.31.2;
Upgraded IPFilter to 4.1.1
 1.30 16-Jan-2004  abs Allow DEF_NAT_AGE to be set in kernel config.
 1.29 03-Jan-2004  tron Remove extra tokens at end of #undef directive.
 1.28 16-Dec-2003  abs Comment out #undef LARGE_NAT so LARGE_NAT can be set in a kernel config file
without having to edit this file as well.
 1.27 19-Sep-2002  martti branches: 1.27.6;
Upgraded IPFilter to 3.4.29
 1.26 02-May-2002  martti branches: 1.26.4;
Upgraded IPFilter to 3.4.27
 1.25 14-Mar-2002  martin Add MSS clamping to the IP Filter NAT subsystem.

Configured by a new option "mssclamp" in NAT rules, like:

map pppoe0 192.168.1.0/24 -> 0/32 mssclamp 1452

This is based on work by Xiaodan Tang <xtang@qnx.com>.
 1.24 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.23 06-Apr-2001  darrenr branches: 1.23.2;
fix fragment cache security hole
 1.22 26-Mar-2001  mike Resolve conflicts.
 1.21 09-Aug-2000  veego branches: 1.21.2;
Resolve conflicts.
 1.20 12-Jun-2000  veego branches: 1.20.2;
Resolve conflicts.
 1.19 21-May-2000  veego branches: 1.19.2;
Resolve conflicts.
 1.18 03-May-2000  veego Resolve conflicts.
 1.17 01-Feb-2000  veego Resolve conflicts.
 1.16 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.15 22-Nov-1998  mrg branches: 1.15.4; 1.15.10; 1.15.16;
merge ipf 3.2.10
 1.14 29-May-1998  veego Resolve conflicts from the import of IPFilter 3.2.7.
 1.13 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.12 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.11 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.10 21-Sep-1997  veego branches: 1.10.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.9 06-Jul-1997  thorpej branches: 1.9.2;
Restore original RCS IDs.
 1.8 05-Jul-1997  darrenr fix conflicts from import
 1.7 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.6 27-May-1997  thorpej Make this compile on 32-bit architecutres again:
- Get arguments to ioctl right (cmd is a u_long in NetBSD)
 1.5 25-May-1997  darrenr fix conflicts
 1.4 29-Mar-1997  thorpej Resolve conflicts from merge.

XXX !!! XXX !!!
I noticed a few semi-serious bugs while doing this merge, one of which
has existed for a fairly long time. Some of them are addressed in this
commit (because they caused the kernel to not compile), and are annoted
by "XXX" and "--thorpej". The other one will be addressed shortly in
a future commit, and, as far as I can tell, affects all operating systems
which IP Filter supports.
 1.3 29-Jan-1997  thorpej ioctl cmd arguments are u_long, not int. Pointed out by
Fred L. Templin <templin@nas.nasa.gov>
 1.2 05-Jan-1997  veego Add $NetBSD$ id's and restore the orginal Id's.
 1.1 05-Jan-1997  mrg branches: 1.1.1;
initial import of darren reed's ip-filter, version 3.1.2.
 1.1.1.24 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.23 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.22 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.21 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.20 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.19 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.18 09-Aug-2000  veego Import IP Filter 3.4.9
 1.1.1.17 12-Jun-2000  veego Import IP Filter 3.4.6
 1.1.1.16 21-May-2000  veego Import IP Filter 3.4.3
 1.1.1.15 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.14 01-Feb-2000  veego Import IP Filter 3.3.8
 1.1.1.13 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.12 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.11 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.10 29-May-1998  veego Import IP Filter 3.2.7
 1.1.1.9 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.8 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.7 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.6 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.5 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.4 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.3 25-May-1997  darrenr Import version 3.2alpha7
 1.1.1.2 27-Mar-1997  darrenr Bring in entire 3.2alpha2 source tree
 1.1.1.1 27-Mar-1997  darrenr Update to version 3.2alpha2
 1.9.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.10.2.4 24-Nov-1998  cgd pull up rev(s) 1.15 from trunk (ipfilter 3.2.10). (mrg)
 1.10.2.3 22-Jul-1998  mellon Pull up 1.14 (veego)
 1.10.2.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.10.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.15.16.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.15.10.3 21-Apr-2001  bouyer Sync with HEAD
 1.15.10.2 27-Mar-2001  bouyer Sync with HEAD.
 1.15.10.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.15.4.2 14-Apr-2001  he Pull up revision 1.23 (via patch, requested by darrenr):
Fix bug related to fragment cache handling.
 1.15.4.1 20-Dec-1999  he Pull up revision 1.16 (requested by darrenr):
Update IPF to version 3.3.5.
 1.19.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.20.2.4 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.20.2.3 09-Feb-2002  he Pull up revisions 1.22,1.24 (requested by martti):
Updated IPFilter to 3.4.23.
 1.20.2.2 14-Apr-2001  he Pull up revision 1.23 (requested by darrenr):
Fix bug related to fragment cache handling.
 1.20.2.1 31-Aug-2000  veego Pull up ipf 3.4.9 (requested by veego). approved by releng-1-5.

basesrc/dist/ipf/HISTORY 1.8 -> 1.9
basesrc/dist/ipf/fils.c 1.9 -> 1.10
basesrc/dist/ipf/ip_sfil.c 1.5 -> 1.6
basesrc/dist/ipf/ipf.c 1.4 -> 1.5
basesrc/dist/ipf/ipmon.c 1.4 -> 1.5
basesrc/dist/ipf/ipnat.c 1.5 -> 1.6
basesrc/dist/ipf/natparse.c 1.3 -> 1.4
basesrc/dist/ipf/parse.c 1.4 -> 1.5
basesrc/dist/ipf/iplang/iplang_y.y 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.1 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.5 1.1 -> 1.2
syssrc/sys/netinet/fil.c 1.36 -> 1.37
syssrc/sys/netinet/ip_auth.c 1.17 -> 1.18
syssrc/sys/netinet/ip_fil.c 1.57 -> 1.58
syssrc/sys/netinet/ip_ftp_pxy.c 1.16 -> 1.17
syssrc/sys/netinet/ip_log.c 1.10 -> 1.11
syssrc/sys/netinet/ip_nat.c 1.34 -> 1.35
syssrc/sys/netinet/ip_nat.h 1.20 -> 1.21
syssrc/sys/netinet/ip_rcmd_pxy.c 1.4 -> 1.5
syssrc/sys/netinet/ip_state.c 1.26 -> 1.27
syssrc/sys/netinet/ip_state.h 1.16 -> 1.17
syssrc/sys/netinet/ipl.h 1.8 -> 1.9

Changes:
>3.4.9 08/08/2000 - Released
>
>implement new aging mechanism in fr_tcp_age()
>
>fix icmp state checking bug
>
>revamp buildsunos script and build both sparcv7/sparcv9 for Solaris
>if on an Ultra with a 64bit system & compiler (Caseper Dik)
>
>open ipfilter device read only if we know we can
>
>print out better information for ICMP packets in ipmon
>
>move checking for source spoofed packets to a point where we can generate
>logs of them
>
>return EFAULT from ircopyptr/iwcopyptr
>
>don't do ioctl(SIOCGETFS) for auth stats
>
>fix up freeing mbufs for post-4.3BSD
>
>fix returning of inc from ftp proxy
>
>fix bugs with ipfs -R/-W (Caseper Dik)
>
>3.4.8 19/07/2000 - Released
>
>create fake opt_inet6.h for FreeBSD-4 compile as LKM
>
>add #ifdef's for KLD_MODULE sanity
>
>NAT fastroute'd packets which come out of return-*
>
>fix upper/lower case crap in ftp proxy and get seq# checking fixed up.
>
>3.4.7 08/07/2000 - Released
>
>make "ipf -y" lookup NAT if's which are unknown
>
>prepend line numbers to ioctl error messages in ipf/ipnat
>
>don't apply patches to FreeBSD twice
>
>allow for ip_len to be on an unaligned boundary early on in fr_precheck
>
>fix printing of icmp code when it is 0
>
>correct printing of port numbers in map rules with from/to
>
>don't allow fr_func to be called at securelevel > 0 or rules to be added
>if securelevel > 0 if they have a non-zero fr_func.
 1.21.2.5 20-Sep-2002  thorpej Sync with HEAD.
 1.21.2.4 04-May-2002  thorpej Update from trunk.
 1.21.2.3 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.21.2.2 28-Feb-2002  nathanw Catch up to -current.
 1.21.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.23.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.23.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.23.2.1 11-Feb-2002  jdolecek Sync w/ -current.
 1.26.4.3 30-Dec-2003  jmc Backout changes from #1568 as too many people are reporting problems
with "out of the box" configurations.
 1.26.4.2 26-Nov-2003  cyber Patch (requested by jklos ticket #1564):
Change to ip filter"s NAT code to keep excessive NAT entries from
causing the kernel to panic.
 1.26.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.27.6.2 19-Oct-2004  skrll Sync with HEAD
 1.27.6.1 03-Aug-2004  skrll Sync with HEAD
 1.31.2.2 13-Aug-2004  jmc branches: 1.31.2.2.2;
Pullup rev 1.33-1.34 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.31.2.1 30-May-2004  tron Pull up revision 1.32 (requested by christos in ticket #416):
PR/24969: Arto Selonen: /usr/sbin/ipfs from ipfilter 4.1.1 does not work
patch applied.
 1.31.2.2.2.1 06-Feb-2005  jmc Pull up revision 1.35 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.6 02-Oct-2004  christos These are ipfilter files, although they don't have the same copyright.
Thanks jaromir.
 1.5 28-Mar-2004  martti branches: 1.5.4;
Upgraded IPFilter to 4.1.1
 1.4 19-Sep-2002  martti branches: 1.4.6;
Resync with official IPF
 1.3 09-Jun-2002  itojun whitespace
 1.2 01-Apr-2002  jdolecek branches: 1.2.2;
add __KERNEL_RCSID()
 1.1 24-Jan-2002  martti branches: 1.1.1;
Initial revision
 1.1.1.2 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.1 24-Jan-2002  martti branches: 1.1.1.1.2; 1.1.1.1.4; 1.1.1.1.6;
Import IPFilter 3.4.23
 1.1.1.1.6.5 20-Sep-2002  thorpej Sync with HEAD.
 1.1.1.1.6.4 20-Jun-2002  nathanw Catch up to -current.
 1.1.1.1.6.3 17-Apr-2002  nathanw Catch up to -current.
 1.1.1.1.6.2 28-Feb-2002  nathanw Catch up to -current.
 1.1.1.1.6.1 24-Jan-2002  nathanw file ip_netbios_pxy.c was added on branch nathanw_sa on 2002-02-28 04:15:09 +0000
 1.1.1.1.4.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.1.1.1.4.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.1.1.1.4.3 16-Mar-2002  jdolecek Catch up with -current.
 1.1.1.1.4.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.1.1.1.4.1 24-Jan-2002  jdolecek file ip_netbios_pxy.c was added on branch kqueue on 2002-02-11 20:10:36 +0000
 1.1.1.1.2.4 27-Nov-2002  itojun sys/netinet/ip_h323_pxy.c via patch
sys/netinet/ip_ipsec_pxy.c via patch
sys/netinet/ip_netbios_pxy.c via patch

Fix compilation on a.out systems.

(Thorsten Frueauf)
 1.1.1.1.2.3 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.1.1.1.2.2 09-Feb-2002  he Pull up revision 1.1 (new, requested by martti):
Updated IPFilter to 3.4.23
 1.1.1.1.2.1 24-Jan-2002  he file ip_netbios_pxy.c was added on branch netbsd-1-5 on 2002-02-09 16:55:22 +0000
 1.2.2.1 20-Jun-2002  gehenna catch up with -current.
 1.4.6.4 19-Oct-2004  skrll Sync with HEAD
 1.4.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.4.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.4.6.1 03-Aug-2004  skrll Sync with HEAD
 1.5.4.1 06-Feb-2005  jmc Pull up revision 1.6 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.330 17-Jul-2025  ozaki-r in: avoid racy ifa_acquire(rt->rt_ifa) in ip_output()

If a rtentry is being destroyed asynchronously, ifa referenced by rt_ifa
can be destructed and taking ifa_acquire(rt->rt_ifa) aborts with a
KASSERT failure. Fortunately, the ifa is not actually freed because of
a reference by rt_ifa, it can be available (except some functions like
psref) so as long the rtentry is held.

PR kern/59527
 1.329 11-Jun-2025  ozaki-r in: narrow the scope of ifa in ip_output (NFC)
 1.328 11-Jun-2025  ozaki-r in: take a reference of ifp on IP_ROUTETOIF

The ifp could be released after ia4_release(ia).
 1.327 11-Jun-2025  ozaki-r in: get rid of unused argument from ip_newid() and ip_newid_range()
 1.326 19-Apr-2023  mlelstv branches: 1.326.6;
Again allow multicast packets to be sent from unnumbered interfaces.
 1.325 19-Apr-2023  ozaki-r Revert "Fix panic on packet sending via a route with rt_ifa of AF_LINK."

The fix is mistakenly upstreamed.
 1.324 21-Nov-2022  knakahara branches: 1.324.2;
Fix panic on packet sending via a route with rt_ifa of AF_LINK.

A route with rt_ifa of AF_LINK can be set by some routing daemons when
it adds a route that has a gateway of AF_LINK. If there is no address on
a target interface, the kernel sets an AF_LINK address of the interface to
rt_ifa of the route. In that case, a variable of a local address in
ip_output (ia) can be NULL and we need more NULL-checks of it.
 1.323 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.322 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.321 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.320 08-Sep-2020  christos Add IP_BINDANY, IPV6_BINDANY which can be used to bind to any address in
order to implement transparent proxies.
 1.319 28-Aug-2020  christos Don't cache the sa, because we are dealing with multiple mbufs (from ozaki-r)
 1.318 28-Aug-2020  ozaki-r inet: reduce silent packet discards
 1.317 28-Aug-2020  ozaki-r inet: reduce indents of a normal path to improve readability (NFCI)
 1.316 28-Aug-2020  ozaki-r inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.
 1.315 27-Dec-2019  msaitoh s/referece/reference/ in comment.
 1.314 05-Jun-2019  knakahara The packets which will be esp-fragmented should not be applied pfil. Pointed out by ohishi@IIJ, thanks.
 1.313 05-Jun-2019  knakahara Fix rtcache cannot be released once an esp-fragmented packet is sent. Pointed out by ohishi@IIJ, thanks.
 1.312 15-May-2019  ozaki-r Get rid of IFNET_LOCK for if_mcast_op to avoid a deadlock

The IFNET_LOCK was added to avoid data races on if_flags for IFF_ALLMULTI.
Unfortunatetly it caused a deadlock instead. A known scenario causing a
deadlock is to occur the following two operations concurrently: (a) a removal of
an IP adddres assigned to an interface and (b) a manipulation of multicast
groups to the interface. The resource dependency graph is like this:
softnet_lock => IFNET_LOCK => psref_target_destroy => softint => softnet_lock

Thanks to the previous commit that avoids data races on if_flags for
IFF_ALLMULTI by another approach, we can remove IFNET_LOCK and defuse the
deadlock.

PR kern/54189
 1.311 13-May-2019  ozaki-r Count packets dropped by pfil
 1.310 04-Feb-2019  mrg rework the #ifdef IPSEC code to not use fallthru.
same number of lines with more local context.
 1.309 22-Dec-2018  maxv Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.
 1.308 12-Dec-2018  rin Simplify logic in ip{,6}_output().

Now, we have M_CSUM_TSOv[46] bit in ifp->if_csum_flags_tx when
TSO[46] is enabled for the interface. So we can simply check
whether TSO[46] is required in a packet but missing in the
interface by (sw_csum & M_CSUM_TSOv[46]).

Note that this is a very rare case where TSO[46] is suddenly
turned off during a packet passing b/w TCP and IP.

part of PR kern/53562
OK msaitoh
 1.307 11-Jul-2018  maxv Rename

ip_undefer_csum -> in_undefer_cksum
in_delayed_cksum -> in_undefer_cksum_tcpudp

The two previous names were inconsistent and misleading.

Put the two functions into in_offload.c. Add comments to explain what
we're doing.

The same could be done for IPv6.
 1.306 02-Jun-2018  maxv branches: 1.306.2;
Copy more mbuf flags.
 1.305 29-May-2018  maxv Fix an XXX of mine, be clearer about what we're doing. Basically we want to
preserve the fragment offset and flags. That's necessary if the packet
we're fragmenting is itself a fragment.
 1.304 29-Apr-2018  maxv Remove unused and misleading argument from ipsec_set_policy.
 1.303 21-Apr-2018  maxv Remove #ifndef __vax__.

The check enforces a 4-byte-aligned size for the option mbuf. If the size
is not multiple of 4, the computation of ip_hl gets truncated in the
output path. There is no reason for this check not to be present on VAX.

While here add a KASSERT in ip_insertoptions to enforce the assumption.

Discussed briefly on tech-net@
 1.302 13-Apr-2018  maxv Remove useless comment and style.
 1.301 13-Apr-2018  maxv Reduce the diff between similar blocks.
 1.300 13-Apr-2018  maxv Reorder a few instructions to clarify. Replace two bcopy by memcpy.
 1.299 30-Mar-2018  maya correct typo: and and -> and (comments only)

heads up on this being a common typo from chris28.
 1.298 03-Mar-2018  maxv branches: 1.298.2;
Add KASSERTs, we don't want m_nextpkt in ipsec{4/6}_process_packet.
 1.297 27-Feb-2018  maxv Dedup: merge ipsec4_set_policy and ipsec6_set_policy. The content of the
original ipsec_set_policy function is inlined into the new one.
 1.296 27-Feb-2018  maxv Dedup: merge

ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy

The already-existing ipsec_get_policy() function is inlined in the new
one.
 1.295 12-Feb-2018  christos Keep a pointer to the interface of the multicast membership, because the
multicast element itself might go away in in_delmulti (but the interface
can't because we hold the lock). From ozaki-r@
 1.294 07-Feb-2018  mrg ip_add_membership() has an missing {} issue, but solve it by
dropping the "goto out" that would have happened immediately
next anyway, ie, should be NFC.
 1.293 06-Feb-2018  maxv Several changes, mostly cosmetic:

* Add a KASSERT in ip_output(), we expect (at least) the IP header to be
here.

* In ip_fragment(), declare two variables instead of recomputing the
values each time. Add an XXX for ipoff, it seems to me we should also
remove IP_RF.

* Rename the arguments of ip_optcopy().

* Style: use NULL for pointers, remove ()s for return statements, and
add whitespaces for clarity.

No real functional change.
 1.292 10-Jan-2018  christos from ozaki-r: use the proper ifp.
XXX: perhaps push the lock in in_delmulti()?
 1.291 10-Jan-2018  christos - this is not python, we need braces
- protect ifp locking against NULL
 1.290 01-Jan-2018  christos Remove comment now that the getsockopt code passes the size.
 1.289 01-Jan-2018  christos 1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo
 1.288 22-Dec-2017  ozaki-r Fix usage of curlwp_bind in ip_output

curlwp_bindx must be called in LIFO order, i.e., we can't call curlwp_bind
and curlwp_bindx like this:
bound1 = curlwp_bind();
bound2 = curlwp_bind();
curlwp_bindx(bound1);
curlwp_bindx(bound2);

ip_outout did so if NET_MPSAFE. Fix it.
 1.287 15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.286 11-Dec-2017  ryo As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.285 17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.284 10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.283 23-Jul-2017  para kmem_intr_free kmem_intr_[z]alloced memory

the underlying pools are the same but api-wise those should match
 1.282 04-Jul-2017  roy Rename u to udst, .dst to .sa and .dst4 to sin.
Create sockaddr for the source address in usrc so it won't stamp on udst.

This fixes a regression caused in r1.280
 1.281 03-Jul-2017  khorben Typo
 1.280 03-Jul-2017  roy When outputting, search for the sending address on the sending interface
rather than blindly picking the first matcing address from any interface
when testing source address validity.

This allows another interface to have the same address, but be detached.
 1.279 12-May-2017  ryo branches: 1.279.2;
replace in_fmtaddr() by IN_PRINT(), and delete function in_fmtaddr()
 1.278 10-May-2017  ozaki-r Stop ipsec4_output returning SP to the caller

SP isn't used by the caller (ip_output) and also holding its
reference looks unnecessary.
 1.277 07-May-2017  christos PR/52074: Frank Kardel: current npf map directive broken
Don't filter packets that can't be resolved to source interfaces because
they could have been generated by a packet filter.
 1.276 05-Mar-2017  ozaki-r branches: 1.276.4;
Fix the position of curlwp_bindx; it should be after if_put
 1.275 03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.274 02-Mar-2017  ozaki-r Make sure imo_membership is protected by inp's lock (solock)
 1.273 02-Mar-2017  ozaki-r Make usages of ifp MP-safe in some functions of IP multicast
 1.272 22-Feb-2017  ozaki-r Add assertions and comments for lock states of socket and pcb
 1.271 17-Feb-2017  ozaki-r Make NOMPSAFE comments informative
 1.270 13-Feb-2017  ozaki-r Use IFQ_LOCK instead of splnet for if_snd
 1.269 16-Jan-2017  christos rename arplog -> ARPLOG to make it clear that it is a macro and tuck-in the
buffer used for address formatting.
 1.268 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.267 11-Jan-2017  ozaki-r branches: 1.267.2;
Get rid of unnecessary header inclusions
 1.266 10-Jan-2017  knakahara avoid double rtcache_unref().

reviewed by ozaki-r@n.o.
 1.265 12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.264 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.263 20-Sep-2016  roy Drop UDP packets as well as TCP without error when sending from detached or
tentative addresses.
 1.262 18-Sep-2016  christos Dealing with arplog is a bit more complicated...
 1.261 15-Sep-2016  roy Ensure that packets are sent from a valid address.
If the packet is TCP and the address is detached or tentative then
it's just dropped, otherwise an error is returned.

This is needed because you can bind to a valid address and it can then
become invalid.

This satisfies RFC 4862 section 5.5.4.
 1.260 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.259 08-Jul-2016  ozaki-r branches: 1.259.2;
Replace macros to get an IP address with proper inline functions

The inline functions are more friendly for applying psz/psref;
they consist of only simple interations.
 1.258 21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.257 20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.256 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.255 09-May-2016  ozaki-r Fix compilation for ppc
 1.254 04-May-2016  christos fix compilation for ppc.
 1.253 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.252 26-Apr-2016  ozaki-r Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.
 1.251 19-Apr-2016  ozaki-r Fix error path
 1.250 19-Apr-2016  ozaki-r Separate MPLS-related routines from ip_hresolv_output

No functional changes.
 1.249 18-Apr-2016  ozaki-r Get rid of meaningless RTF_UP check from ip_hresolv_output

The check is meaningless because
- An obtained rtentry is ensured that it's always RTF_UP by rtcache,
rtalloc1 and rtlookup. If the rtentry isn't changed (i.e., RTF_UP gets
dropped) during processing, the check should be unnecessary
- Even if not, i.e., an obtained rtentry can be changed during processing,
checking only at the point doesn't help; the rtentry can be changed after
the check

Instead we have to ensure that RTF_UP isn't dropped if someone is using it
somehow. Note that we already ensure that a rtentry being used isn't freed
by rt_refcnt.

Proposed on tech-kern and tech-net.
 1.248 20-Jan-2016  riastradh Give proper prototype to ip_output.
 1.247 02-Sep-2015  ozaki-r Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.
 1.246 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.245 07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.244 17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.243 14-Jul-2015  ozaki-r Move rt_gwroute operation out of stripoutput

We should do it in ip_hresolv_needed.
 1.242 01-Jul-2015  ozaki-r Use ip_hresolv_output for if_token as well

I thought we cannot apply ip_hresolv_output to if_token because
rt0 looked being needed by arpresolve in token_output. However,
rt0 is actually not used by arpresolve in NetBSD (see obsolete
ARPRESOLVE macro).
 1.241 08-Jun-2015  roy errno -> error, spotted by the hawk skrll
 1.240 08-Jun-2015  roy It's possible we could not have any ready addresses.
 1.239 04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.238 27-Apr-2015  ozaki-r Add missing error checks on rtcache_setdst

It can fail with ENOMEM.
 1.237 24-Apr-2015  ozaki-r KNF
 1.236 03-Apr-2015  ozaki-r Don't grab KERNEL_LOCK during if_output when NET_MPSAFE

The change makes L3 MP-safe work easy. At this point
we deal with only IP forwarding.

No functional change when NET_MPSAFE isn't enabled.
 1.235 31-Mar-2015  ozaki-r Add missing ifdef IPSEC
 1.234 23-Mar-2015  roy Add RTF_BROADCAST to mark routes used for the broadcast address when
they are created on the fly. This makes it clear what the route is for
and allows an optimisation in ip_output() by avoiding a call to
in_broadcast() because most of the time we do talk to a host.
It also avoids a needless allocation for the storage of llinfo_arp and
thus vanishes from arp(8) - it showed as incomplete anyway so this
is a nice side effect.

Guard against this and routes marked with RTF_BLACKHOLE in
ip_fastforward().
While here, guard against routes marked with RTF_BLACKHOLE in
ip6_fastforward().
RTF_BROADCAST is IPv4 only, so don't bother checking that here.
 1.233 26-Nov-2014  ozaki-r branches: 1.233.2;
Call looutput with holding KERNEL_LOCK

This fixes diagnostic assertion "KERNEL_LOCKED_P()" in if_loop.c.

PR kern/49410
 1.232 12-Oct-2014  christos Refactor the multicast membership code so that we can handle v4 mapped
addresses using the v6 membership ioctls.
 1.231 11-Oct-2014  christos exposet multicast option functions which are used by the v6 code now.
 1.230 06-Jun-2014  rmind branches: 1.230.2;
ip_output: zero iproute structure only when needed; reduce the scope
of some variables.
 1.229 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.228 29-May-2014  rmind Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.
 1.227 23-May-2014  rmind Fix the assert in the previous commit.
 1.226 22-May-2014  rmind - Make ip_setmoptions(), ip_getmoptions() and ip_pcbopts() static.
- ip_output: eliminate 7th variadic argument; IP_RETURNMTU is flag
always used to store MTU size into struct inpcb::inp_errormtu.
- Clean up these routines: reduce #ifdefs, variable scopes, etc.
 1.225 17-May-2014  rmind Replace open-coded access (and boundary checking) of ifindex2ifnet with
if_byindex() function.
 1.224 29-Jun-2013  rmind branches: 1.224.4;
- Rewrite parts of pfil(9): use array to store hooks and thus be more cache
friendly (there are only few hooks in the system). Make the structures
opaque and the interface more strict.
- Remove PFIL_HOOKS option by making pfil(9) mandatory.
 1.223 27-Jun-2013  christos branches: 1.223.2;
implement IP_PKTINFO and IP_RECVPKTINFO.
 1.222 08-Jun-2013  rmind Split IPsec code in ip_input() and ip_forward() into the separate routines
ipsec4_input() and ipsec4_forward(). Tested by christos@.
 1.221 08-Jun-2013  rmind Split IPSec logic from ip_output() into a separate routine - ipsec4_output().
No change to the mechanism intended. Tested by christos@.
 1.220 05-Jun-2013  christos IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.219 04-Jun-2013  christos PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.218 02-Feb-2013  kefren get rid of ip_len local variable. Use ntohs(ip->ip_len) like the rest
of the code in the two places this variable was used
 1.217 25-Jun-2012  christos branches: 1.217.2;
rename rfc6056 -> portalgo, requested by yamt
 1.216 22-Jun-2012  christos PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.215 30-Apr-2012  rmind - Replace some malloc(9) uses with kmem(9).
- G/C M_IPMOPTS, M_IPMADDR and M_BWMETER.
 1.214 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.213 15-Feb-2012  drochner fix for IPSEC tunnel + NAT-T + esp_frag:
Output packets larger than "esp_frag" are fragmented first
and then reinjected into ip_output for encapsulation
and transfer. The problem was that each packet got a new
ip_id value assigned, so that fragments couldn't be matched
by the receiver. Offset information was overwritten too.
approved by releng
 1.212 31-Dec-2011  christos - fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.211 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.210 31-Oct-2011  yamt branches: 1.210.2; 1.210.6;
redo ip_output.c rev.1.206 and 1.207 differently. PR/43664.
ok'ed by martin@
 1.209 17-Jul-2011  joerg Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.208 14-Apr-2011  yamt after ip_input.c rev.1.285 and 1.286, restore kernel_lock for if_output.
 1.207 09-Apr-2011  martin PR kern/43664:
mlelstv pointed out that we sometimes may use checksums on loopback
interfaces. Make the test consistent with the code path selecting
the checksum operation before invoking fragmentation.
 1.206 09-Apr-2011  martin We do not do checksums on loopback interfaces, not even if fragmenting.
Fixes PR kern/43664.
 1.205 17-Jul-2009  minskim branches: 1.205.4; 1.205.6;
Add the IP_MINTTL socket option.

The IP_MINTTL option may be used on SOCK_STREAM sockets to discard
packets with a TTL lower than the option value. This can be used to
implement the Generalized TTL Security Mechanism (GTSM) according to
RFC 3682.

OK'ed by christos@.
 1.204 16-Jul-2009  minskim Add the IP_RECVTTL option support.

If the IP_RECVTTL option is enabled on a SOCK_DGRAM socket, the
recvmsg(2) call will return the TTL of the received datagram. The
msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the TTL value.

Modeled after FreeBSD implementation.
 1.203 01-Jul-2009  martin From Wolfgang Stukenbrock in PR kern/41659: add missing splx().
 1.202 06-May-2009  elad Remove some usage of "priv" and "privileged" variables and instead pass
around credentials. Also push down kauth(9) calls closer to where the
operation is done.

Mailing list reference:

http://mail-index.netbsd.org/tech-net/2009/04/30/msg001270.html
 1.201 18-Mar-2009  cegger bzero -> memset
 1.200 12-Oct-2008  plunky branches: 1.200.2; 1.200.4; 1.200.8; 1.200.10;
update ip_pcbopts() to use sockopt(9) API.

cleans up function and one small fix is that we now stop copying user
options to the mbuf when the _EOL is given, previously this function
would continue to copy options.
 1.199 12-Oct-2008  plunky do not sleep while allocating memory here as socket lock is held
 1.198 16-Aug-2008  plunky constify sockopt in the PRCO_SETOPT path
 1.197 06-Aug-2008  plunky Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.196 28-Apr-2008  martin branches: 1.196.2; 1.196.6;
Remove clause 3 and 4 from TNF licenses
 1.195 23-Apr-2008  thorpej branches: 1.195.2;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.194 12-Apr-2008  thorpej branches: 1.194.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.193 07-Apr-2008  thorpej Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.
 1.192 06-Feb-2008  matt branches: 1.192.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.
 1.191 14-Jan-2008  dyoung Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().
 1.190 12-Jan-2008  dyoung Good-bye, rtcache_check(). Call both rtcache_validate() and
rtcache_update(,1) instead of rtcache_check().
 1.189 29-Dec-2007  degroote Restore correctly the sp level in case of FAST_IPSEC + IPSEC_NAT_T
 1.188 29-Dec-2007  degroote Simplify the FAST_IPSEC output path
Only record an IPSEC_OUT_DONE tag when we have finished the processing
In ip{,6}_output, check this tag to know if we have already processed this
packet.
Remove some dead code (IPSEC_PENDING_TDB is not used in NetBSD)

Fix pr/36870
 1.187 21-Dec-2007  matt Add fix for ip_id information leakage. Since the leakage information is
primarily used with TCP SYN and RST packets and such packets are less than
the smallest sized packet that an IP stack is allowed to fragment, we simply
set ip_id to 0 for all packets 68 bytes or less.
 1.186 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.185 28-Nov-2007  dyoung branches: 1.185.2; 1.185.6;
Move IN_NEED_CHECKSUM() to in_offload.h for re-use.
 1.184 19-Sep-2007  dyoung branches: 1.184.6;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.183 02-Sep-2007  dyoung m_copym(..., 0, M_COPYALL, ...) -> m_copypacket(..., ...).
 1.182 02-Sep-2007  dyoung m_copy() was deprecated, apparently, long ago. m_copy(...) ->
m_copym(..., M_DONTWAIT).
 1.181 28-Aug-2007  cube Fix ipv4 multicast that could sometimes send packets with the wrong
Ethernet multicast address.

Reported by jmcneill@, fix discussed with dyoung@, _very_ light testing by
myself, some more money for my dealer of anxiolytics after reading
ip_output()'s twisted code maze.
 1.180 02-May-2007  dyoung branches: 1.180.2; 1.180.6; 1.180.8;
Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.179 04-Mar-2007  christos branches: 1.179.2; 1.179.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.178 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.177 17-Feb-2007  dyoung branches: 1.177.2;
Join lines.
 1.176 29-Jan-2007  dyoung bzero -> memset.
 1.175 29-Jan-2007  dyoung In ip_setmoptions(), don't leave a route cache (struct route) on
the stack if we exit with EADDRNOTAVAIL.
 1.174 13-Jan-2007  joerg Unconditionally zero and free iproute. Before IPsec tunnel packets e.g.
from ICMP could end up in leaking the reference in iproute, as
ipsec4_output would overwrite the ro pointer in state.

Tested by Juraj Hercek and supposed to fix PR kern/35273 and kern/35318.
 1.173 08-Jan-2007  yamt ip_output: reload ip_len after running pfil_run_hooks.
pf "fragment reassemble" rule can change it, at least.
 1.172 04-Jan-2007  elad Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.171 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.170 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.169 06-Dec-2006  dyoung Remove stray curly brace. Thanks, yamt!
 1.168 06-Dec-2006  dyoung KNF.
 1.167 25-Nov-2006  yamt branches: 1.167.2;
move tso-by-software code to their own files. no functional changes.
 1.166 13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.165 23-Jul-2006  ad branches: 1.165.4; 1.165.6;
Use the LWP cached credentials where sane.
 1.164 12-Jul-2006  tron Remove test for M_CSUM_TSOv6 flag which is not (yet) defined in
NetBSD-current.
 1.163 12-Jul-2006  tron Add diagnostic checks for hardware-assisted checksum related flags in
the mbuf which supposed to get sent out:
- Complain in ip_output() if any of the IPv6 related flags are set.
- Complain in ip6_output() if any of the IPv4 related flags are set.
- Complain in both functions if the flags indicate that both a TCP and
UCP checksum should be calculated by the hardware.
 1.162 15-May-2006  christos branches: 1.162.4;
kauth fallout
 1.161 14-May-2006  elad integrate kauth.
 1.160 23-Feb-2006  christos branches: 1.160.2; 1.160.4; 1.160.6;
Handle IPSEC_NAT_T in the FAST_IPSEC case.
XXX: need to fix the FAST_IPSEC code now.
 1.159 11-Dec-2005  christos branches: 1.159.2; 1.159.4; 1.159.6;
merge ktrace-lwp.
 1.158 19-Sep-2005  dyoung People have to read this code, so I am removing the double-negative
tautology, #ifndef notdef, which is not only superfluous, but easily
misread as #ifdef notyet.
 1.157 11-Sep-2005  seb Replace plain 255 by MAXTTL.
 1.156 11-Sep-2005  christos Allow the multicast_ttl and the multicast_loop options to be set with both
u_char and u_int option variables. Original patch from seb.
 1.155 18-Aug-2005  yamt - introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.
 1.154 10-Aug-2005  yamt move {tcp,udp}_do_loopback_cksum back to tcp/udp
so that they can be referenced by ipv6.
 1.153 29-May-2005  christos branches: 1.153.2;
- add const
- remove bogus casts
- avoid nested variables
 1.152 18-Apr-2005  yamt ip_output: handle the case M_CSUM_TSOv4 but !IFCAP_TSOv4.
 1.151 18-Apr-2005  yamt fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.
 1.150 07-Apr-2005  yamt when doing TSO, avoid to use duplicated ip_id heavily.
XXX ip_randomid
 1.149 11-Mar-2005  matt branches: 1.149.2;
Set ip_len to 0 in the wm driver when TSO is being used.
 1.148 10-Mar-2005  thorpej In ip_fragment():
- Use the correct IP header length variable for other-than-first packets.
- Remove redundant setting of the original IP header length in the first
packet's csum_data. (It's already set before ip_fragment() is called
in 1.147.)
 1.147 09-Mar-2005  matt Move all the hardware-assisted checksum/segment offload code together.
 1.146 06-Mar-2005  matt Add IPv4/TCP hooks for TCP Segment Offload on transmit.
 1.145 05-Mar-2005  briggs Fix checksum offload for fragmented packets. From John Heasley
on gnats-bugs in PR kern/29544.
Tested with an NFS client using default rwsize on an NFS server
with wm(4) interface configured IP4CSUM,TCP4CSUM,UDP4CSUM.
Prior revision required the server to have checksum offload disabled.
 1.144 26-Feb-2005  perry nuke trailing whitespace
 1.143 18-Feb-2005  heas My last change for pseudo-header checksums was flawed. The pseudo-header
checksum is always in the L4 header by the time we get to this point. It
was occasionally not there due to a bug in tcp_respond, which has since
been fixed.
So, instead just stash the length of the L3 header in the high 16 bits of
csum_data.
 1.142 12-Feb-2005  heas For controllers (eg: hme & gem) that can only perform linear hardware checksums
(from an offset to the end of the packet), the pseudo-header checksum must be
calculated by software. So, provide it in the TCP/UDP header when
M_CSUM_NO_PSEUDOHDR is set in the interface's if_csum_flags_tx.

The start offset, the end of the IP header, is also provided in the high 16
bits of pkthdr.csum_data. Such that the driver need not examine the packet
at all.

XXX At the request of Jonathan Stone, note that sharing of if_csum_flags_tx &
pkthdr.csum_flags for checksum quirks should be re-evaluated.
 1.141 12-Feb-2005  manu Add support for IPsec Network Address Translator traversal (NAT-T), as
described by RFC 3947 and 3948.
 1.140 03-Feb-2005  perry ANSIfy function declarations
 1.139 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.138 15-Dec-2004  thorpej branches: 1.138.2; 1.138.4;
Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.137 04-Dec-2004  peter Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.136 06-Oct-2004  thorpej Slight simplification to IFA_STATS handling.
 1.135 04-Sep-2004  manu IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
 1.134 06-Jul-2004  minoura Remove broken code for now: getsockopt(s, IPPROTO_IP, IP_IPSEC_POLICY,...).
It returned EINVAL, now returns ENOPROTOOPT.
Ok'd by itojun.
 1.133 01-Jun-2004  itojun update mtu value if outgoing interface changes with ipsec ops
(draft-touch-vpn case only?) iij seil team
 1.132 18-May-2004  christos Fix buffer overrun in in_pcbopts() (FreeBSD PR/66386)
 1.131 26-Apr-2004  matt Remove #else clause of __STDC__
 1.130 02-Mar-2004  thorpej Use the new IPSEC_PCB_SKIP_IPSEC() to bypass a socket policy lookup
when possible. This shaves several cycles from the output path for
non-IPsec connections, even if the policy is cached in the PCB.
 1.129 10-Dec-2003  itojun use if_indexlim (instead of if_index) and ifindex2ifnet[x] != NULL
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces. from kame
 1.128 19-Nov-2003  jonathan Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.
 1.127 17-Nov-2003  jonathan Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
 1.126 17-Oct-2003  enami Increment stats when packet is dropped since there is no room
to put all fragments in the interfaces's send queue. Some large
UDP packets are dropped here and administrator may want to bump ifqmaxlen.
 1.125 14-Oct-2003  itojun more correction to ip_fragment; free mbuf correctly if ENOBUFS is raised
during fragmenting.
 1.124 14-Oct-2003  itojun avoid mbuf leak on ip_fragment(); obey 4.4bsd mbuf passing rule (mbuf passed
to a function must be freed by the called function on error).
pointed out by enami
 1.123 03-Oct-2003  itojun when dropping M_PKTHDR, need to free m_tag associated with it.
 1.122 01-Oct-2003  itojun correct ip_fragment() wrt ip->ip_off handling.
do not send out incomplete fragment due to ENOBUFS (behavior change from 4.4BSD)
 1.121 19-Sep-2003  jonathan Fast-ipsec can call ip_output() with a null 'struct socket *so'
argument. So check so is non-NULL before doing the pointer-chasing
dance to find the PCB. (Unless and until we rework fast-ipsec and
KAME, to pass a struct in_pcbhdr * instead of the struct socket *).
 1.120 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.119 27-Aug-2003  itojun don't intiialize m by m0, m0 is not initialized (by introduction of ip_fragment)
 1.118 23-Aug-2003  itojun need sys/domain.h for FAST_IPSEC case; jonathan
 1.117 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.116 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.115 22-Aug-2003  jonathan Change KAME code for ip_output()/ip6_output() to obtain struct socket*
from the explicit inpcb*/in6pcb* argument. set_socket() becomes redundant.
 1.114 19-Aug-2003  itojun remove unneeded #ifdef __NetBSD__
 1.113 19-Aug-2003  itojun make ip_fragment public (it is for coming PF integration)
 1.112 19-Aug-2003  christos make ip_fragment static and add prototype.
 1.111 19-Aug-2003  itojun correct ip_multicast_if fix to always set ifp (tnx Shiva)
 1.110 18-Aug-2003  itojun fix problem we can't drop membership on !IFF_UP interface.
reported by Shiva Shenoy

while we're here, fix another problem when the same interface address is
assigned to !IFF_MULTICAST and IFF_MULTICAST interface. if ip_multicast_if()
returns the first one, join/leave will fail, which is not an desired effect.
 1.109 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.108 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.107 30-Jun-2003  itojun branches: 1.107.2;
freebsd code somehow crept in
 1.106 30-Jun-2003  itojun after pfil_run_hooks, need to fix hlen as well
 1.105 26-Jun-2003  itojun tabify
 1.104 26-May-2003  yamt - don't pass mbufs with M_CSUM_* flags which isn't supported by the interface
to if_output.
- offload ip-checksumming for each fragmented packets as well.
 1.103 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.102 17-Sep-2002  darrenr From FreeBSD (1.164) courtesy of Maxim Konovalov:
"In rare cases when there is no room for ip options ip_insertoptions()
can fail and corrupt a header length. Initialize len and check what
ip_insertoptions() returns."
 1.101 11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.100 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.99 24-Jun-2002  itojun set ia as well
 1.98 24-Jun-2002  itojun do not consult routing table under the following condition:
- the destination is IPv4 multicast or 255.255.255.255, and
- outgoing interface is specified via socket option

this simplifies operation of routed
(no longer reqiure 224.0.0.0/4 to be set up)
 1.97 09-Jun-2002  itojun whitespace
 1.96 31-May-2002  itojun since if_mtu is u_long, use u_long for mtu.
 1.95 07-Feb-2002  thorpej branches: 1.95.8; 1.95.10;
IFF_POINTTOPOINT interfaces can also transmit packets to broadcast
destinations.
 1.94 06-Feb-2002  thorpej ip_mloopback(): process the delayed checksum on the copy, not
the original mbuf.
 1.93 31-Jan-2002  itojun correct bad ip checksum on multicast loopback packet. PR14597
 1.92 22-Jan-2002  itojun make sure to check address family on route cache. with IPv4 mapped
address we can see both AF_INET/INET6.
 1.91 08-Jan-2002  itojun don't panic when there's no interface address exist for the specified multicast
outgoing interface (ia == NULL after IFP_TO_IA).

historic behavior (up to revision 1.43) was to use 0.0.0.0 as source address,
but it seems like a mistake according to RFC1112/1122.
 1.90 21-Nov-2001  itojun update outgoing ifp, only if tunnel mode ipsec is used. this is to
honor IP_MULTICAST_IF setsockopt on ipsec-over-multicast. sync with kame
 1.89 13-Nov-2001  lukem add RCSIDs
 1.88 17-Sep-2001  thorpej Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.
 1.87 11-Aug-2001  yamt branches: 1.87.2;
fix cksum error of udp and tcp packet with ip options
 1.86 02-Jun-2001  thorpej branches: 1.86.2;
Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.85 26-May-2001  ragge Remove one #ifdef vax, bugfix another. Should probably be #ifdef i386 also.
 1.84 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.83 27-Feb-2001  itojun branches: 1.83.2;
remove obsolete #if 0'ed section
(IPsec and DF bit interaction - the code was incorrect anyways)
 1.82 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.81 13-Jan-2001  itojun allow IP_MULTICAST_IF and IP_ADD/DROP_MEMBERSHIP to specify interface
by interface index. if the interface address specified is in 0.0.0.0/8
it will be considered as interface index in network byteorder.

getsockopt(IP_MULTICAST_IF) preserves old behavior if
setsockopt(IP_MULTICAST_IF) was done with interface address, and
returns interface index if setsockopt(IP_MULTICAST_IF) was done with
interface index (again using the form in 0.0.0.0/8).

Suggested by Dave Thaler, based on RIPv2 MIB spec (RFC1724 section 3.3).

http://mail-index.netbsd.org/tech-net/2001/01/13/0003.html
 1.80 13-Jan-2001  itojun on getsockopt(IP_IPSEC_POLICY), make sure to initialize len
 1.79 11-Nov-2000  thorpej Actually, our local ip_off variable isn't needed.
 1.78 11-Nov-2000  thorpej Restructure the PFIL_HOOKS mechanism a bit:
- All packets are passed to PFIL_HOOKS as they come off the wire, i.e.
fields in protocol headers in network order, etc.
- Allow for multiple hooks to be registered, using a "key" and a "dlt".
The "dlt" is a BPF data link type, indicating what type of header is
present.
- INET and INET6 register with key == AF_INET or AF_INET6, and
dlt == DLT_RAW.
- PFIL_HOOKS now take an argument for the filter hook, and mbuf **,
an ifnet *, and a direction (PFIL_IN or PFIL_OUT), thus making them
less IP (really, IP Filter) centric.

Maintain compatibility with IP Filter by adding wrapper functions for
IP Filter.
 1.77 23-Oct-2000  itojun fix IFA_STATS.
- use hashed in_ifaddr lookup.
- correct endianness.
 1.76 17-Oct-2000  thorpej Add an IP_MTUDISC flag to the flags that can be passed to
ip_output(). This flag, if set, causes ip_output() to set
DF in the IP header if the MTU in the route is not locked.

This allows a bunch of redundant code, which I was never
really all that happy about adding in the first place, to
be eliminated.

Inspired by a similar change made by provos@openbsd.org when
he integrated NetBSD's Path MTU Discovery code into OpenBSD.
 1.75 28-Jun-2000  mrg remove include of <vm/vm.h>
 1.74 10-May-2000  itojun branches: 1.74.4;
add missing boundary checks to ip options processing.
correct timestamp option validation (len and ptr upper/lower bound
based on RFC791).
fill "pointer" field for parameter problem in timestamp option processing.
 1.73 13-Apr-2000  is Copy M_BCAST and M_MCAST flags when fragmenting a packet (else
Multicast packets won't be send to the correct link layer address
by the interface driver).
By Artur Grabowski, PR 9772.
 1.72 31-Mar-2000  jdolecek Since last duplicate prototype cleanup, we need to include
<netinet/ip_mroute.h> to get ip_mforward() prototype if MROUTING
is defined.
 1.71 30-Mar-2000  augustss Remove register declarations.
 1.70 22-Mar-2000  itojun tabify a line.
 1.69 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.68 20-Feb-2000  darrenr pass "struct pfil_head *" to pfil_add_hook and pfil_remove hook rather
than "struct protosw *".
 1.67 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.66 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.65 20-Dec-1999  itojun avoid shared cluster mbuf overwrite on multicast packet loopback.
(bsdi and freebsd fixed this a long time ago...)

PR: 9020
From: pavlin@catarina.usc.edu
 1.64 13-Dec-1999  is Handle packets to 255.255.255.255 like multicast packets. Fixes PR 7682 by
Darren Reed.
 1.63 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.62 09-Jul-1999  thorpej branches: 1.62.2; 1.62.8;
defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.61 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.60 07-Jun-1999  mrg oops. move sendit: above the PFIL_HOOKS so that multicast traffic is filtered. from darren reed.
 1.59 04-May-1999  hwr Don't let packets with a Class-D source address escape the host.
Fixes second half of kern/7003 by Jonathan Stone <jonathan@DSG.Stanford.EDU>.
 1.58 27-Mar-1999  aidan branches: 1.58.2; 1.58.4; 1.58.6;
Added per-addr input/output statistics. Currently just support netatalk
and netinet, currently only tested under netinet.

Disabled by default, enabled by compiling the kernel with option
IFA_STATS. Enabling this feature seems to make the ip_output function
take 13% longer than before, which should be OK for people that need
this feature.
 1.57 12-Mar-1999  perry exterminate ovbcopy. patches provided by Erik Bertelsen, pr-7145
 1.56 19-Jan-1999  mycroft There's just no plausible reason to byte-swap ip_id internally. It's opaque.
 1.55 11-Jan-1999  thorpej Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.
 1.54 19-Dec-1998  thorpej Reverse the copyright-notice-swap. It went against existing practice.
 1.53 26-Oct-1998  ws branches: 1.53.4;
Fix a buglet when looking up an interface for multicast:
Zero out the routing structure before calling the route lookup code
in order to correctly match addresses.
 1.52 20-Oct-1998  matt vax -> __vax__ (and mips to __mips__ in ultrix_misc.c)
 1.51 30-Sep-1998  tls Switch order of TNF and UCB copyrights so UCB copyright is first; this seems more appropriate since UCB wrote the original code, after all.
 1.50 09-Aug-1998  mrg defopt PFIL_HOOKS.
 1.49 17-Jul-1998  sommerfe Fix PR5508: ipfil cut-through forwarding causes panic
 1.48 28-Apr-1998  matt Only transmit fragments if the send queue of interface can actually hold
all of the fragments. Use the mtu of route in preference of the MTU of the
interface when doing fragmentation decisions. (ie. Fragment to the path
mtu if it is available).
 1.47 24-Mar-1998  kml Ensure that we take the IP option length into account when we calculate
the effective maximum send size for TCP. ip_optlen() and tcp_optlen()
should probably be inlined for efficiency.
 1.46 19-Mar-1998  mrg convert pfil(9) in and out lists from <sys/queue.h> LISTs to TAILQs, and
change pfil_add_hook to put output filters at the tail of the queue,
while continuing to place input filters at the head of the queue. update
the two users of these functions, and document these changes.

fixes PR#4593.
 1.45 15-Feb-1998  tls Add correct copyright notice for IP address hash change. This code is donated to TNF by the original copyright holder, Panix.
 1.44 13-Feb-1998  tls Change list of interface IP addresses to a hash. Improves performance on hosts with a large number of IP addresses significantly.
 1.43 13-Feb-1998  kleink Fix variable declarations: register -> register int.
 1.42 12-Jan-1998  scottr Use option header file for MROUTING
 1.41 07-Jan-1998  lukem add the following, derived from FreeBSD:
* IP_PORTRANGE socket option, which controls how the ephemeral ports
are allocated. it takes the following settings:
IP_PORTRANGE_DEFAULT use anonportmin (49152) -> anonportmax (65535)
IP_PORTRANGE_HIGH as IP_PORTRANGE_DEFAULT (retained for FreeBSD
compat reasons, where these are separate)
IP_PORTRANGE_LOW use 600 -> 1023. only works if uid==0.
* in_pcb flag INP_ANONPORT. set if port was allocated ephmerally
 1.40 14-Oct-1997  matt branches: 1.40.2;
Add support for returning maximum supported MTU when ip_output fails with
EMSGSIZE.
 1.39 15-Apr-1997  christos branches: 1.39.4;
Move the mtod calls *after* we've made sure that the packet has passed the
filter successfully. Otherwise it can be NULL if the filter blocked it,
and we die. How did this ever work?
 1.38 18-Feb-1997  mrg pseudo-device ipfilter brings in PFIL_HOOKS.
 1.37 11-Jan-1997  thorpej branches: 1.37.4;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.
 1.36 20-Dec-1996  mrg always reassign ip after calling function.
 1.35 20-Dec-1996  mrg in pfil_hooks: always reassign ip after calling hook.
 1.34 22-Oct-1996  veego Fix a panic from the pfil_hooks.
 1.33 11-Oct-1996  is Fix a mbuf leak in ip_output().

Scenario: If ip_insertoptions() prepends a new mbuf to the chain, the
bad: label's m_freem(m0) still would free only the original mbuf chain
if the transmission failed for, e.g., no route to host; resulting in
one lost mbuf per failed packet. (The original posting included a
demonstration program).

Original report of this bug was by jinmei@isl.rdc.toshiba.co.jp
(JINMEI Tatuya) on comp.bugs.4bsd.
 1.32 14-Sep-1996  mrg move the packet filter hooks in to a saner location. while i'm here, rename
PACKET_FILTER to PFIL_HOOKS.
 1.31 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.30 06-Sep-1996  mrg add packet filter interface code. see pfil(9) for more details. you
need the PACKET_FILTER option to enable this code. currently, ipfilter
version 3.1.1-beta has been converted to use this new interface.
 1.29 26-Feb-1996  mrg branches: 1.29.4;
two more local addr changes, all done differently now (idea from charles)
 1.28 13-Feb-1996  christos netinet prototypes
 1.27 01-Jul-1995  cgd null mbuf pointer could cause system crash; avoid it. From
Torsten Duwe <duwe@immd4.informatik.uni-erlangen.de>.
 1.26 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.25 04-Jun-1995  mycroft Don't cast things unnecessarily.
 1.24 04-Jun-1995  mycroft Clean up many more casts.
 1.23 01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.22 15-May-1995  cgd simplify ip_output() out-of-memory condition slightly, and style nits.
 1.21 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.20 11-Apr-1995  mycroft Remove some explicit references to loif.
 1.19 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.18 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.17 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.16 19-Jan-1994  brezak Fix arguments to ip_getmoptions.
 1.15 18-Jan-1994  brezak Fix some prototype detected warnings/errors.
 1.14 18-Jan-1994  brezak Patch for ip-multicast bugs from mccanne@ee.lbl.gov (Steven McCanne)
 1.13 10-Jan-1994  mycroft Should compile now with or without `options MULTICAST'.
 1.12 09-Jan-1994  mycroft Prototype the rest.
 1.11 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.10 07-Jan-1994  cgd kill COMPAT_OLDSOCKOPT
 1.9 06-Jan-1994  ws Apparently noone ever tested the COMPAT_OLDSOCKOPT flag...
 1.8 18-Dec-1993  mycroft Canonicalize all #includes.
 1.7 06-Dec-1993  cgd oops; fix that last...
 1.6 06-Dec-1993  cgd the ugliest compatibility hack i think i've ever seen...
define COMPAT_OLDSOCKOPT to get new kernels to work with the
old args to [sg]sockopt. this is going to go away "soon".
note that this option only has effect if MULTICAST is not defined.
 1.5 06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.4 05-Nov-1993  cgd fix from david greenman, davidg@freefall.cdrom.com:
fixed bug where large amounts of unidirectional UDP traffic would fill
the interface output queue and further udp packets would be fragmented
and only partially sent - keeping the output queue full and jamming the
network, but not actually getting any real work done (because you can't
send just 'part' of a udp packet - if you fragment it, you must send
the whole thing). The fix involves adding a check to make sure that the
output queue has sufficient space for all of the fragments.
 1.3 22-May-1993  cgd branches: 1.3.4;
add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.2 06-Nov-1993  mycroft Merge changes from trunk.
 1.3.4.1 16-Oct-1993  mycroft Nuke references to machine/mtpr.h.
 1.29.4.1 11-Dec-1996  mycroft From trunk:
Fix a mbuf leak when fragmentation fails due to lack of memory.
 1.37.4.1 12-Mar-1997  is Merge in changes from Trunk
 1.39.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.40.2.4 29-Oct-1998  cgd pull up rev 1.53 from trunk (ws)
 1.40.2.3 01-Oct-1998  cgd pull up revisions 1.44-1.45, 1.51 (via patch) from trunk. (tls)
 1.40.2.2 22-Jul-1998  mellon Pull up 1.46 and 1.49 (veego)
 1.40.2.1 09-May-1998  mycroft Pull up patch from kml.
 1.53.4.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.58.6.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.58.6.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.58.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.58.4.3 02-Aug-1999  thorpej Update from trunk.
 1.58.4.2 01-Jul-1999  thorpej Sync w/ -current.
 1.58.4.1 21-Jun-1999  thorpej Sync w/ -current.
 1.58.2.3 30-Apr-2000  he Pull up revision 1.73 (requested by is):
Pass M_BCAST and M_MCAST flags to fragments. Fixes PR#9772.
 1.58.2.2 20-Dec-1999  he Pull up revision 1.65 (requested by itojun):
Avoid panic caused by shared cluster mbuf overwrite on multicast
packet loopback for packets with certain sizes. Fixes PR#9020.
 1.58.2.1 22-Jun-1999  perry pullup 1.59->1.60 (mrg): ipfilter should filter multicast traffic...
 1.62.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.62.2.6 21-Apr-2001  bouyer Sync with HEAD
 1.62.2.5 12-Mar-2001  bouyer Sync with HEAD.
 1.62.2.4 11-Feb-2001  bouyer Sync with HEAD.
 1.62.2.3 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.62.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.62.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.74.4.4 04-Aug-2003  msaitoh Pull up revision 1.106-1.107 (requested by itojun in ticket #53):
after pfil_run_hooks, need to fix hlen as well.
freebsd code somehow crept in.
 1.74.4.3 15-Dec-2002  he Pull up revision 1.102 (requested by darrenr):
Initialize len and check what ip_insertoptions() returns.
In some rare cases there might not be sufficient room for
the options.
 1.74.4.2 14-Jan-2002  he Pull up revision 1.91 (requested by itojun):
Avoid kernel panic on IPv4 multicast packet transmission if there
is no IPv4 address assigned to the specified outgoing interface.
 1.74.4.1 06-Apr-2001  he Pull up revision 1.82 (via patch, requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.83.2.16 20-Sep-2002  thorpej Sync with HEAD.
 1.83.2.15 17-Sep-2002  nathanw Catch up to -current.
 1.83.2.14 27-Aug-2002  nathanw Catch up to -current.
 1.83.2.13 01-Aug-2002  nathanw Catch up to -current.
 1.83.2.12 12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.83.2.11 24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.83.2.10 20-Jun-2002  nathanw Catch up to -current.
 1.83.2.9 28-Feb-2002  nathanw Catch up to -current.
 1.83.2.8 11-Jan-2002  nathanw More catchup.
 1.83.2.7 08-Jan-2002  nathanw Catch up to -current.
 1.83.2.6 14-Nov-2001  nathanw Catch up to -current.
 1.83.2.5 21-Sep-2001  nathanw Catch up to -current.
 1.83.2.4 24-Aug-2001  nathanw Catch up with -current.
 1.83.2.3 21-Jun-2001  nathanw Catch up to -current.
 1.83.2.2 13-Mar-2001  nathanw Be more careful not to dereference curproc when there might not be
a process context.
 1.83.2.1 05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.86.2.7 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.86.2.6 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.86.2.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.86.2.4 16-Mar-2002  jdolecek Catch up with -current.
 1.86.2.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.86.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.86.2.1 25-Aug-2001  thorpej Merge Aug 24 -current into the kqueue branch.
 1.87.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.95.10.3 30-Jun-2003  grant Pull up revisions 1.106-1.107 (requested by itojun in ticket #1358):

after pfil_run_hooks, need to fix hlen as well

freebsd code somehow crept in
 1.95.10.2 01-Nov-2002  tron Pull up revision 1.98-1.99 (requested by itojun in ticket #356):
do not consult routing table under the following condition:
- - the destination is IPv4 multicast or 255.255.255.255, and
- - outgoing interface is specified via socket option
this simplifies operation of routed
(no longer require 224.0.0.0/4 to be set up)
 1.95.10.1 30-Sep-2002  lukem Pull up revision 1.102 (requested by darrenr in ticket #842):
From FreeBSD (1.164) courtesy of Maxim Konovalov:
"In rare cases when there is no room for ip options ip_insertoptions()
can fail and corrupt a header length. Initialize len and check what
ip_insertoptions() returns."
 1.95.8.3 29-Aug-2002  gehenna catch up with -current.
 1.95.8.2 15-Jul-2002  gehenna catch up with -current.
 1.95.8.1 20-Jun-2002  gehenna catch up with -current.
 1.107.2.11 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.107.2.10 01-Apr-2005  skrll Sync with HEAD.
 1.107.2.9 08-Mar-2005  skrll Sync with HEAD.
 1.107.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.107.2.7 15-Feb-2005  skrll Sync with HEAD.
 1.107.2.6 04-Feb-2005  skrll Sync with HEAD.
 1.107.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.107.2.4 19-Oct-2004  skrll Sync with HEAD
 1.107.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.107.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.107.2.1 03-Aug-2004  skrll Sync with HEAD
 1.138.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.138.4.1 12-Feb-2005  yamt sync with head.
 1.138.2.1 29-Apr-2005  kent sync with -current
 1.149.2.5 31-Mar-2007  bouyer Pull up following revision(s) (requested by joerg in ticket #1734):
sys/netinet/ip_output.c: revision 1.167.2.2
Unconditionally zero and free iproute. Before IPsec tunnel packets e.g.
from ICMP could end up in leaking the reference in iproute, as
ipsec4_output would overwrite the ro pointer in state.
Tested by Juraj Hercek and supposed to fix PR kern/35273 and kern/35318.
 1.149.2.4 28-Jan-2007  tron Pull up following revision(s) (requested by yamt in ticket #1656):
sys/netinet/ip_output.c: revision 1.173
ip_output: reload ip_len after running pfil_run_hooks.
pf "fragment reassemble" rule can change it, at least.
 1.149.2.3 21-Oct-2005  riz branches: 1.149.2.3.2; 1.149.2.3.4;
Pull up following revision(s) (requested by seb in ticket #903):
sys/netinet/ip_output.c: revisions 1.156 - 1.157
Allow the multicast_ttl and the multicast_loop options to be set with both
u_char and u_int option variables. Original patch from seb.
 1.149.2.2 06-May-2005  tron Pull up revision 1.151 (requested by yamt in ticket #251):
fix problems related to loopback interface checksum omission. PR/29971.
- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)
ok'ed by Jason Thorpe.
 1.149.2.1 13-Apr-2005  tron Pull up revision 1.150 (requested by yamt in ticket #145):
when doing TSO, avoid to use duplicated ip_id heavily.
XXX ip_randomid
 1.149.2.3.4.1 28-Jan-2007  tron Pull up following revision(s) (requested by yamt in ticket #1656):
sys/netinet/ip_output.c: revision 1.173
ip_output: reload ip_len after running pfil_run_hooks.
pf "fragment reassemble" rule can change it, at least.
 1.149.2.3.2.1 28-Jan-2007  tron Pull up following revision(s) (requested by yamt in ticket #1656):
sys/netinet/ip_output.c: revision 1.173
ip_output: reload ip_len after running pfil_run_hooks.
pf "fragment reassemble" rule can change it, at least.
 1.153.2.8 11-Feb-2008  yamt sync with head.
 1.153.2.7 21-Jan-2008  yamt sync with head
 1.153.2.6 07-Dec-2007  yamt sync with head
 1.153.2.5 27-Oct-2007  yamt sync with head.
 1.153.2.4 03-Sep-2007  yamt sync with head.
 1.153.2.3 26-Feb-2007  yamt sync with head.
 1.153.2.2 30-Dec-2006  yamt sync with head.
 1.153.2.1 21-Jun-2006  yamt sync with head.
 1.159.6.2 01-Jun-2006  kardel Sync with head.
 1.159.6.1 22-Apr-2006  simonb Sync with head.
 1.159.4.2 09-Sep-2006  rpaulo sync with head
 1.159.4.1 07-Feb-2006  rpaulo sotoinpcb_hdr -> sotoinpcb.
 1.159.2.1 01-Mar-2006  yamt sync with head.
 1.160.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.160.4.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.160.4.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.160.2.2 11-Aug-2006  yamt sync with head
 1.160.2.1 24-May-2006  yamt sync with head.
 1.162.4.1 13-Jul-2006  gdamore Merge from HEAD.
 1.165.6.2 18-Dec-2006  yamt sync with head.
 1.165.6.1 10-Dec-2006  yamt sync with head.
 1.165.4.3 01-Feb-2007  ad Sync with head.
 1.165.4.2 12-Jan-2007  ad Sync with head.
 1.165.4.1 18-Nov-2006  ad Sync with head.
 1.167.2.2 28-Mar-2007  jdc Pull up revision 1.174 (requested by joerg in ticket #524).

Unconditionally zero and free iproute. Before IPsec tunnel packets e.g.
from ICMP could end up in leaking the reference in iproute, as
ipsec4_output would overwrite the ro pointer in state.

Tested by Juraj Hercek and supposed to fix PR kern/35273 and kern/35318.
 1.167.2.1 18-Jan-2007  tron Pull up following revision(s) (requested by yamt in ticket #361):
sys/netinet/ip_output.c: revision 1.173
ip_output: reload ip_len after running pfil_run_hooks.
pf "fragment reassemble" rule can change it, at least.
 1.177.2.4 07-May-2007  yamt sync with head.
 1.177.2.3 12-Mar-2007  rmind Sync with HEAD.
 1.177.2.2 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.177.2.1 17-Feb-2007  yamt file ip_output.c was added on branch yamt-idlelwp on 2007-02-27 16:54:56 +0000
 1.179.4.1 11-Jul-2007  mjf Sync with head.
 1.179.2.2 09-Oct-2007  ad Sync with head.
 1.179.2.1 08-Jun-2007  ad Sync with head.
 1.180.8.3 23-Mar-2008  matt sync with HEAD
 1.180.8.2 09-Jan-2008  matt sync with HEAD
 1.180.8.1 06-Nov-2007  matt sync with HEAD
 1.180.6.3 03-Dec-2007  joerg Sync with HEAD.
 1.180.6.2 02-Oct-2007  joerg Sync with HEAD.
 1.180.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.180.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.184.6.3 18-Feb-2008  mjf Sync with HEAD.
 1.184.6.2 27-Dec-2007  mjf Sync with HEAD.
 1.184.6.1 08-Dec-2007  mjf Sync with HEAD.
 1.185.6.2 19-Jan-2008  bouyer Sync with HEAD
 1.185.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.185.2.1 26-Dec-2007  ad Sync with head.
 1.192.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.192.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.192.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.194.2.1 18-May-2008  yamt sync with head.
 1.195.2.5 19-Aug-2009  yamt sync with head.
 1.195.2.4 18-Jul-2009  yamt sync with head.
 1.195.2.3 16-May-2009  yamt sync with head
 1.195.2.2 04-May-2009  yamt sync with head.
 1.195.2.1 16-May-2008  yamt sync with head.
 1.196.6.1 19-Oct-2008  haad Sync with HEAD.
 1.196.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.200.10.1 09-Jul-2009  snj branches: 1.200.10.1.2;
Pull up following revision(s) (requested by martin in ticket #847):
sys/netinet/ip_output.c: revision 1.203
From Wolfgang Stukenbrock in PR kern/41659: add missing splx().
 1.200.10.1.2.1 21-Apr-2010  matt sync to netbsd-5
 1.200.8.2 23-Jul-2009  jym Sync with HEAD.
 1.200.8.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.200.4.1 09-Jul-2009  snj Pull up following revision(s) (requested by martin in ticket #847):
sys/netinet/ip_output.c: revision 1.203
From Wolfgang Stukenbrock in PR kern/41659: add missing splx().
 1.200.2.1 28-Apr-2009  skrll Sync with HEAD.
 1.205.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.205.4.1 21-Apr-2011  rmind sync with head
 1.210.6.3 02-Jun-2012  mrg sync to latest -current.
 1.210.6.2 05-Apr-2012  mrg sync to latest -current.
 1.210.6.1 18-Feb-2012  mrg merge to -current.
 1.210.2.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.210.2.3 30-Oct-2012  yamt sync with head
 1.210.2.2 23-May-2012  yamt sync with head.
 1.210.2.1 17-Apr-2012  yamt sync with head
 1.217.2.4 03-Dec-2017  jdolecek update from HEAD
 1.217.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.217.2.2 23-Jun-2013  tls resync from head
 1.217.2.1 25-Feb-2013  tls resync with head
 1.223.2.3 17-Oct-2013  rmind Eliminate some of the splsoftnet() calls, misc clean up.
 1.223.2.2 28-Aug-2013  rmind sync with head
 1.223.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.224.4.1 10-Aug-2014  tls Rebase.
 1.230.2.1 01-Dec-2014  martin Pull up following revision(s) (requested by ozaki-r in ticket #277):
sys/netinet/ip_output.c: revision 1.233
Call looutput with holding KERNEL_LOCK
This fixes diagnostic assertion "KERNEL_LOCKED_P()" in if_loop.c.
PR kern/49410
 1.233.2.10 28-Aug-2017  skrll Sync with HEAD
 1.233.2.9 05-Feb-2017  skrll Sync with HEAD
 1.233.2.8 05-Oct-2016  skrll Sync with HEAD
 1.233.2.7 09-Jul-2016  skrll Sync with HEAD
 1.233.2.6 29-May-2016  skrll Sync with HEAD
 1.233.2.5 22-Apr-2016  skrll Sync with HEAD
 1.233.2.4 19-Mar-2016  skrll Sync with HEAD
 1.233.2.3 22-Sep-2015  skrll Sync with HEAD
 1.233.2.2 06-Jun-2015  skrll Sync with HEAD
 1.233.2.1 06-Apr-2015  skrll Sync with HEAD
 1.259.2.4 20-Mar-2017  pgoyette Sync with HEAD
 1.259.2.3 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.259.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.259.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.267.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.276.4.2 19-May-2017  pgoyette Resolve conflicts from previous merge (all resulting from $NetBSD
keywork expansion)
 1.276.4.1 11-May-2017  pgoyette Sync with HEAD
 1.279.2.7 18-Mar-2018  martin Pull up following revision(s) (requested by tih in ticket #639):
sys/kern/uipc_socket.c: revision 1.258
sys/kern/uipc_socket.c: revision 1.259
sys/netinet/ip_input.c: revision 1.364 (via patch)
sys/netinet/ip_output.c: revision 1.289
sys/netinet/in.h: revision 1.102
sys/netinet/in_pcb.c: revision 1.181
share/man/man9/sockopt.9: revision 1.11
sys/netinet/in_pcb.h: revision 1.65
sys/sys/socketvar.h: revision 1.146
sys/kern/uipc_syscalls.c: revision 1.189
sys/netinet/ip_output.c: revision 1.290
share/man/man4/ip.4: revision 1.41
share/man/man4/ip.4: revision 1.42
sys/kern/uipc_syscalls.c: revision 1.190

pass valsize for getsockopt like we do for setsockopt
make sure that we have enough space, don't require the exact size
(Tom Ivar Helbekkmo)

1) "#define ipi_spec_dst ipi_addr" in <netinet/in.h>
2) Change the IP_RECVPKTINFO option to control the generation of
IP_PKTINFO control messages, the way it's done in Solaris.
3) Remove the superfluous IP_RECVPKTINFO control message.
4) Change the IP_PKTINFO option to do different things depending on
the parameter it's supplied with:
- If it's sizeof(int), assume it's being used as in Linux:
- If it's non-zero, turn on the IP_RECVPKTINFO option.
- If it's zero, turn off the IP_RECVPKTINFO option.
- If it's sizeof(struct in_pktinfo), assume it's being used as in
Solaris, to set a default for the source interface and/or
source address for outgoing packets on the socket.
5) Return what Linux or Solaris compatible code expects, depending
on data size, and just added a fallback to a Linux (and current NetBSD)
compatible value if the size is unknown (as it is now), or,
in the future, if the calling application specifies a receiving
buffer that doesn't match either data item.

From: Tom Ivar Helbekkmo

new sentence-new line

Remove comment now that the getsockopt code passes the size.

Add a new sockopt member to keep track of the actual size of the option
that should be returned to the caller in getsockopt(2).
(Tom Ivar Helbekkmo)
 1.279.2.6 19-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #557):
sys/netinet/ip_output.c: 1.295
Keep a pointer to the interface of the multicast membership, because the
multicast element itself might go away in in_delmulti (but the interface
can't because we hold the lock). From ozaki-r@
 1.279.2.5 13-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #494):
sys/netinet/ip_output.c: revision 1.291-1.292
- this is not python, we need braces
- protect ifp locking against NULL
--
from ozaki-r: use the proper ifp.
XXX: perhaps push the lock in in_delmulti()?
 1.279.2.4 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #463):
sys/netinet/in.c: revision 1.212
sys/netinet/ip_output.c: revision 1.288
sys/netinet6/in6.c: revision 1.256
sys/netinet6/in6_pcb.c: revision 1.163
sys/sys/lwp.h: revision 1.176
Add missing curlwp_bindx
--
Add missing curlwp_bindx
--
Check LP_BOUND is surely set in curlwp_bindx
This may find an extra call of curlwp_bindx.
--
Fix usage of curlwp_bind in ip_output
curlwp_bindx must be called in LIFO order, i.e., we can't call curlwp_bind
and curlwp_bindx like this:
bound1 = curlwp_bind();
bound2 = curlwp_bind();
curlwp_bindx(bound1);
curlwp_bindx(bound2);
ip_outout did so if NET_MPSAFE. Fix it.
--
Fix wrong usage of psref_held
We can't use it for checking if a caller does NOT hold a given target.
If you want to do it you should have psref_not_held or something.
 1.279.2.3 02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.279.2.2 21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.279.2.1 07-Jul-2017  martin Pull up following revision(s) (requested by roy in ticket #100):
sys/netinet/ip_output.c: revision 1.280
sys/netinet/ip_output.c: revision 1.282
When outputting, search for the sending address on the sending interface
rather than blindly picking the first matcing address from any interface
when testing source address validity.
This allows another interface to have the same address, but be detached.
Rename u to udst, .dst to .sa and .dst4 to sin.
Create sockaddr for the source address in usrc so it won't stamp on udst.
This fixes a regression caused in r1.280
 1.298.2.7 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.298.2.6 28-Jul-2018  pgoyette Sync with HEAD
 1.298.2.5 25-Jun-2018  pgoyette Sync with HEAD
 1.298.2.4 02-May-2018  pgoyette Synch with HEAD
 1.298.2.3 22-Apr-2018  pgoyette Sync with HEAD
 1.298.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.298.2.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.306.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.306.2.1 10-Jun-2019  christos Sync with HEAD
 1.324.2.3 29-Jul-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1140):

sys/netinet/ip_output.c: revision 1.330
sys/netinet/sctp_output.c: revision 1.39
sys/netinet/ip_mroute.c: revision 1.166
sys/netipsec/ipsecif.c: revision 1.24
sys/netipsec/xform_ipip.c: revision 1.80
sys/netinet/ip_output.c: revision 1.327
sys/netinet/ip_output.c: revision 1.328
sys/netinet/ip_input.c: revision 1.406
sys/netinet/ip_output.c: revision 1.329
sys/netinet/in_var.h: revision 1.105

in: get rid of unused argument from ip_newid() and ip_newid_range()

in: take a reference of ifp on IP_ROUTETOIF
The ifp could be released after ia4_release(ia).

in: narrow the scope of ifa in ip_output (NFC)

sctp: follow the recent change of ip_newid()

in: avoid racy ifa_acquire(rt->rt_ifa) in ip_output()
If a rtentry is being destroyed asynchronously, ifa referenced by rt_ifa
can be destructed and taking ifa_acquire(rt->rt_ifa) aborts with a
KASSERT failure. Fortunately, the ifa is not actually freed because of
a reference by rt_ifa, it can be available (except some functions like
psref) so as long the rtentry is held.
PR kern/59527

in: avoid racy ia4_acquire(ifatoia(rt->rt_ifa) in ip_rtaddr()
Same as the case of ip_output(), it's racy and should be avoided.
PR kern/59527
 1.324.2.2 21-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #903):

sys/netinet/ip_output.c: revision 1.326

Again allow multicast packets to be sent from unnumbered interfaces.
 1.324.2.1 25-Apr-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #150):

sys/netinet/ip_output.c: revision 1.325

Revert "Fix panic on packet sending via a route with rt_ifa of AF_LINK."

The fix is mistakenly upstreamed.
 1.326.6.1 02-Aug-2025  perseant Sync with HEAD
 1.5 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.4 23-Jul-2004  martti branches: 1.4.2;
Upgraded IPFilter to 4.1.3
 1.3 21-Apr-2004  itojun kill some strcpy
 1.2 28-Mar-2004  martti branches: 1.2.2;
Sync with official IPFilter
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.2 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.1 28-Mar-2004  martti Import IPFilter 4.1.1
 1.2.2.1 13-Aug-2004  jmc branches: 1.2.2.1.2;
Pullup rev 1.3-1.4 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.2.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.5 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.4.2.6 19-Oct-2004  skrll Sync with HEAD
 1.4.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.4.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.4.2.3 05-Aug-2004  skrll Fix merge mistakes.
 1.4.2.2 03-Aug-2004  skrll Sync with HEAD
 1.4.2.1 23-Jul-2004  skrll file ip_pool.c was added on branch ktrace-lwp on 2004-08-03 10:54:41 +0000
 1.2 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.4; 1.1.1.1.6;
Import IPFilter 4.1.1
 1.1.1.1.6.1 06-Feb-2005  jmc Pull up revision 1.2 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.1.1.1.4.5 19-Oct-2004  skrll Sync with HEAD
 1.1.1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.1.1.4.1 28-Mar-2004  skrll file ip_pool.h was added on branch ktrace-lwp on 2004-08-03 10:54:41 +0000
 1.2 02-Oct-2004  christos These are ipfilter files, although they don't have the same copyright.
Thanks jaromir.
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.2 23-Jul-2004  martti branches: 1.1.1.2.2;
Import IPFilter 4.1.3
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.2;
Import IPFilter 4.1.1
 1.1.1.2.2.6 19-Oct-2004  skrll Sync with HEAD
 1.1.1.2.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.1.2.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.1.1.2.2.3 05-Aug-2004  skrll Fix merge mistakes.
 1.1.1.2.2.2 03-Aug-2004  skrll Sync with HEAD
 1.1.1.2.2.1 23-Jul-2004  skrll file ip_pptp_pxy.c was added on branch ktrace-lwp on 2004-08-03 10:54:41 +0000
 1.1.1.1.2.1 13-Aug-2004  jmc branches: 1.1.1.1.2.1.2;
Pullup rev 1.1.1.2 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.1.1.1.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.2 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.6 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.5 17-Feb-2021  christos - pass the alignment instead of the mask (as Roy asked and to match the
other macro)
- use alignof to determine that alignment and CTASSERT what we expect
- remove unused macros
 1.4 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.3 28-Apr-2008  martin branches: 1.3.4; 1.3.102;
Remove clause 3 and 4 from TNF licenses
 1.2 23-Apr-2008  thorpej branches: 1.2.2;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.1 12-Apr-2008  thorpej branches: 1.1.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.1.2.1 18-May-2008  yamt sync with head.
 1.2.2.1 16-May-2008  yamt sync with head.
 1.3.102.1 03-Apr-2021  thorpej Sync with HEAD.
 1.3.4.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.4.1 28-Apr-2008  mjf file ip_private.h was added on branch mjf-devfs2 on 2008-06-02 13:24:24 +0000
 1.39 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.38 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.37 28-Mar-2004  martti branches: 1.37.2;
Upgraded IPFilter to 4.1.1
 1.36 19-Sep-2002  martti branches: 1.36.6;
Resync with official IPF
 1.35 19-Sep-2002  martti Upgraded IPFilter to 3.4.29
 1.34 09-Jun-2002  itojun whitespace
 1.33 02-May-2002  martti branches: 1.33.2; 1.33.4;
Fix compilation problems
 1.32 02-May-2002  martti Upgraded IPFilter to 3.4.27
 1.31 01-Apr-2002  jdolecek Disable the H.323 proxy again - it's too buggy to be supported option
for now. Suggested by Matthew Green and Bernd Ernesti.
 1.30 01-Apr-2002  jdolecek put back ip_h323_pxy.c - the QNX licence seems to be okay upon
further examination
 1.29 14-Mar-2002  martti Removed unused proxy file
 1.28 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.27 24-Jan-2002  martti Re-sync with IPFilter
 1.26 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.25 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.24 13-Nov-2001  lukem add RCSIDs
 1.23 05-Feb-2001  chs branches: 1.23.2; 1.23.4;
expose the definitions of MIN() and MAX() in sys/param.h to the kernel
and use those in favor of a dozen copies scattered around the source tree.
 1.22 11-May-2000  veego branches: 1.22.4;
Resolve conflicts and fix a compile error in ip_ftp_pxy.c.
 1.21 03-May-2000  veego Resolve conflicts.
 1.20 30-Mar-2000  augustss Remove register declarations.
 1.19 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.18 24-Aug-1999  bouyer branches: 1.18.2; 1.18.8;
Fix for kern/7831 from Darren Reed after discussion on tech-net 2 weeks ago:
check that the packet if of the rigth protocol before giving it to the
proxy module, otherwise let the ipnat code handle it.
What happens in kern/7831 is that a router sends back a icmp message for
a TCP SYN, and ip_proxy.c forwards it to ip_ftp_pxy.c which can only
handle TCP packets. The icmp message is properly handled by ipnat, no need to
go to ip_ftp_pxy.c.
 1.17 02-Feb-1999  cjs branches: 1.17.2;
Remove SCCS markers and make these compile in $NetBSD$ IDs.
 1.16 23-Jan-1999  mycroft Fix problems with fr_tcpsum() that prevented the FTP proxy from working.
 1.15 22-Nov-1998  mrg merge ipf 3.2.10
 1.14 12-Jul-1998  veego Resolve conflicts from the import.
 1.13 29-May-1998  veego Fix some compiler warnings: Missing prototype and ()'s.
 1.12 29-May-1998  veego Resolve conflicts from the import of IPFilter 3.2.7.
 1.11 17-May-1998  veego Resolve conflicts
 1.10 28-Nov-1997  darrenr don't free pointer to static struct. please pullup.
 1.9 25-Nov-1997  mrg fixes for memory leaks in proxying, and byte ordering problems. from darren reed.
 1.8 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.7 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.6 21-Sep-1997  veego branches: 1.6.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.5 06-Jul-1997  thorpej branches: 1.5.2;
Restore original RCS IDs.
 1.4 05-Jul-1997  darrenr fix conflicts from import
 1.3 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.2 27-May-1997  thorpej Make this compile on 32-bit architectures:
- Deal with lame gcc -Wuninitialized warning (which is incorrect)
- Add parens around assignments within conditionals.
 1.1 26-May-1997  darrenr branches: 1.1.1;
Initial revision
 1.1.1.19 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.18 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.17 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.16 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.15 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.14 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.13 11-May-2000  veego Import IP Filter 3.4.2
 1.1.1.12 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.11 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.10 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.9 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.8 29-May-1998  veego Import IP Filter 3.2.7
 1.1.1.7 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.6 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.5 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.4 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.3 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.2 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.1 26-May-1997  darrenr Import new sources for 3.2alpha7
(blah, someone want to clean away /cvsroot/sys/netinet ?)
 1.5.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.6.2.6 24-Nov-1998  cgd pull up rev(s) 1.15 from trunk (ipfilter 3.2.10). (mrg)
 1.6.2.5 22-Jul-1998  mellon Pull up 1.14 (veego)
 1.6.2.4 28-Nov-1997  mellon Pull rev 1.10 up from trunk (darren)
 1.6.2.3 25-Nov-1997  mrg pull up from trunk: fixes for memory leaks in proxying, and byte ordering problems. from darren reed.
 1.6.2.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.6.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.17.2.2 20-Dec-1999  he Pull up revision 1.19 (requested by darrenr):
Update IPF to version 3.3.5.
 1.17.2.1 24-Aug-1999  he Pull up revision 1.18:
Check the protocol before forwarding to proxy module. Fixes
PR#7831. (bouyer/darrenr)
 1.18.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.18.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.18.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.22.4.2 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.22.4.1 09-Feb-2002  he Pull up revisions 1.23-1.27 (requested by martti):
Updated IPFilter to 3.4.23.
 1.23.4.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.23.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.23.4.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.23.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.23.2.8 20-Sep-2002  thorpej Sync with HEAD.
 1.23.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.23.2.6 04-May-2002  thorpej Update from trunk.
 1.23.2.5 17-Apr-2002  nathanw Catch up to -current.
 1.23.2.4 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.23.2.3 28-Feb-2002  nathanw Catch up to -current.
 1.23.2.2 08-Jan-2002  nathanw Catch up to -current.
 1.23.2.1 14-Nov-2001  nathanw Catch up to -current.
 1.33.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.33.2.1 20-Jun-2002  gehenna catch up with -current.
 1.36.6.2 19-Oct-2004  skrll Sync with HEAD
 1.36.6.1 03-Aug-2004  skrll Sync with HEAD
 1.37.2.1 13-Aug-2004  jmc branches: 1.37.2.1.2;
Pullup rev 1.38 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.37.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.39 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.20 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.19 28-Mar-2004  martti branches: 1.19.4;
Upgraded IPFilter to 4.1.1
 1.18 19-Sep-2002  martti branches: 1.18.6;
Upgraded IPFilter to 3.4.29
 1.17 24-Jan-2002  martti branches: 1.17.10;
Upgraded IPFilter to 3.4.23
 1.16 26-Mar-2001  mike branches: 1.16.2;
Resolve conflicts.
 1.15 11-May-2000  veego branches: 1.15.4; 1.15.6;
Resolve conflicts and fix a compile error in ip_ftp_pxy.c.
 1.14 03-May-2000  veego Resolve conflicts.
 1.13 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.12 22-Nov-1998  mrg branches: 1.12.4; 1.12.10; 1.12.16;
merge ipf 3.2.10
 1.11 17-May-1998  veego Resolve conflicts
 1.10 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.9 25-Nov-1997  mrg fixes for memory leaks in proxying, and byte ordering problems. from darren reed.
 1.8 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.7 21-Sep-1997  veego branches: 1.7.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.6 06-Jul-1997  thorpej branches: 1.6.2;
Restore original RCS IDs.
 1.5 05-Jul-1997  darrenr fix conflicts from import
 1.4 28-May-1997  thorpej Change the aps_tout member of struct ap_session from time_t to u_long
so that it can be passed to the filter rule aging functions, which
expect a pointer to a u_long. (time_t is an int on the alpha.)
 1.3 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.2 27-May-1997  thorpej Make this compile on 32-bit architectures:
- Add prototypes.
- Add a forward-decl to avoid a cyclic dependency graph.
 1.1 26-May-1997  darrenr branches: 1.1.1;
Initial revision
 1.1.1.16 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.15 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.14 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.13 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.12 11-May-2000  veego Import IP Filter 3.4.2
 1.1.1.11 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.10 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.9 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.8 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.7 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.6 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.5 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.4 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.3 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.2 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.1 26-May-1997  darrenr Import new sources for 3.2alpha7
(blah, someone want to clean away /cvsroot/sys/netinet ?)
 1.6.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.7.2.4 24-Nov-1998  cgd pull up rev(s) 1.12 from trunk (ipfilter 3.2.10). (mrg)
 1.7.2.3 22-Jul-1998  mellon Pull up 1.11 (veego)
 1.7.2.2 25-Nov-1997  mrg pull up from trunk: fixes for memory leaks in proxying, and byte ordering problems. from darren reed.
 1.7.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.12.16.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.12.10.2 27-Mar-2001  bouyer Sync with HEAD.
 1.12.10.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.4.1 20-Dec-1999  he Pull up revision 1.13 (requested by darrenr):
Update IPF to version 3.3.5.
 1.15.6.3 20-Sep-2002  thorpej Sync with HEAD.
 1.15.6.2 28-Feb-2002  nathanw Catch up to -current.
 1.15.6.1 09-Apr-2001  nathanw Catch up with -current.
 1.15.4.2 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.15.4.1 09-Feb-2002  he Pull up revisions 1.16-1.17 (requested by martti):
Updated IPFilter to 3.4.23
 1.16.2.2 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.16.2.1 11-Feb-2002  jdolecek Sync w/ -current.
 1.17.10.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.18.6.4 19-Oct-2004  skrll Sync with HEAD
 1.18.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.18.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.18.6.1 03-Aug-2004  skrll Sync with HEAD
 1.19.4.1 06-Feb-2005  jmc Pull up revision 1.20 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.11 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.10 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.9 28-Mar-2004  martti branches: 1.9.2;
Upgraded IPFilter to 4.1.1
 1.8 24-Jan-2002  martti branches: 1.8.16;
Upgraded IPFilter to 3.4.23
 1.7 13-Nov-2001  lukem add RCSIDs
 1.6 26-Mar-2001  mike branches: 1.6.2;
Resolve conflicts.
 1.5 11-May-2000  veego branches: 1.5.4; 1.5.6; 1.5.8;
Resolve conflicts and fix a compile error in ip_ftp_pxy.c.
 1.4 03-May-2000  veego Resolve conflicts.
 1.3 01-Feb-2000  veego Resolve conflicts.
 1.2 28-Dec-1999  darrenr update ipfilter code to 3.3.6
 1.1 12-Dec-1999  veego branches: 1.1.1;
Initial revision
 1.1.1.9 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.8 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.7 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.6 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.5 11-May-2000  veego Import IP Filter 3.4.2
 1.1.1.4 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.3 01-Feb-2000  veego Import IP Filter 3.3.8
 1.1.1.2 28-Dec-1999  darrenr update DARRENR branch of netinet to 3.3.6
 1.1.1.1 12-Dec-1999  veego branches: 1.1.1.1.2; 1.1.1.1.4;
Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.1.4.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.1.1.1.2.3 08-Jan-2000  he Pull up revision 1.2 (requested by darrenr):
Update IPF to version 3.3.6.
 1.1.1.1.2.2 20-Dec-1999  he Pull up revision 1.1.1.1 (new) (requested by darrenr):
Update IPF to version 3.3.5.
 1.1.1.1.2.1 12-Dec-1999  he file ip_raudio_pxy.c was added on branch netbsd-1-4 on 1999-12-20 21:02:06 +0000
 1.5.8.3 28-Feb-2002  nathanw Catch up to -current.
 1.5.8.2 14-Nov-2001  nathanw Catch up to -current.
 1.5.8.1 09-Apr-2001  nathanw Catch up with -current.
 1.5.6.3 27-Mar-2001  bouyer Sync with HEAD.
 1.5.6.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.6.1 11-May-2000  bouyer file ip_raudio_pxy.c was added on branch thorpej_scsipi on 2000-11-20 18:10:33 +0000
 1.5.4.2 13-Feb-2002  he Apply patch (requested by he):
Bump first argument to __KERNEL_RCSID for these files, to allow
kernels with IPF to build on a.out systems. Fixes PR#15589.
 1.5.4.1 09-Feb-2002  he Pull up revisions 1.6-1.8 (requested by martti):
Updated IPFilter to 3.4.23
 1.6.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.6.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.8.16.2 19-Oct-2004  skrll Sync with HEAD
 1.8.16.1 03-Aug-2004  skrll Sync with HEAD
 1.9.2.1 13-Aug-2004  jmc branches: 1.9.2.1.2;
Pullup rev 1.10 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.9.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.11 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.12 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.11 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.10 28-Mar-2004  martti branches: 1.10.2;
Upgraded IPFilter to 4.1.1
 1.9 24-Jan-2002  martti branches: 1.9.16;
Re-sync with IPFilter
 1.8 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.7 13-Nov-2001  lukem add RCSIDs
 1.6 26-Mar-2001  mike branches: 1.6.2;
Resolve conflicts.
 1.5 09-Aug-2000  veego branches: 1.5.2; 1.5.4;
Resolve conflicts.
 1.4 11-May-2000  veego branches: 1.4.4;
Resolve conflicts and fix a compile error in ip_ftp_pxy.c.
 1.3 03-May-2000  veego Resolve conflicts.
 1.2 30-Mar-2000  augustss Remove register declarations.
 1.1 12-Dec-1999  veego branches: 1.1.1;
Initial revision
 1.1.1.8 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.7 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.6 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.5 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.4 09-Aug-2000  veego Import IP Filter 3.4.9
 1.1.1.3 11-May-2000  veego Import IP Filter 3.4.2
 1.1.1.2 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.1 12-Dec-1999  veego branches: 1.1.1.1.2; 1.1.1.1.4;
Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.1.4.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.1.1.1.2.2 20-Dec-1999  he Pull up revision 1.1.1.1 (new) (requested by darrenr):
Update IPF to version 3.3.5.
 1.1.1.1.2.1 12-Dec-1999  he file ip_rcmd_pxy.c was added on branch netbsd-1-4 on 1999-12-20 21:02:07 +0000
 1.4.4.3 13-Feb-2002  he Apply patch (requested by he):
Bump first argument to __KERNEL_RCSID for these files, to allow
kernels with IPF to build on a.out systems. Fixes PR#15589.
 1.4.4.2 09-Feb-2002  he Pull up revisions 1.6-1.9 (requested by martti):
Updated IPFilter to 3.4.23
 1.4.4.1 31-Aug-2000  veego Pull up ipf 3.4.9 (requested by veego). approved by releng-1-5.

basesrc/dist/ipf/HISTORY 1.8 -> 1.9
basesrc/dist/ipf/fils.c 1.9 -> 1.10
basesrc/dist/ipf/ip_sfil.c 1.5 -> 1.6
basesrc/dist/ipf/ipf.c 1.4 -> 1.5
basesrc/dist/ipf/ipmon.c 1.4 -> 1.5
basesrc/dist/ipf/ipnat.c 1.5 -> 1.6
basesrc/dist/ipf/natparse.c 1.3 -> 1.4
basesrc/dist/ipf/parse.c 1.4 -> 1.5
basesrc/dist/ipf/iplang/iplang_y.y 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.1 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.5 1.1 -> 1.2
syssrc/sys/netinet/fil.c 1.36 -> 1.37
syssrc/sys/netinet/ip_auth.c 1.17 -> 1.18
syssrc/sys/netinet/ip_fil.c 1.57 -> 1.58
syssrc/sys/netinet/ip_ftp_pxy.c 1.16 -> 1.17
syssrc/sys/netinet/ip_log.c 1.10 -> 1.11
syssrc/sys/netinet/ip_nat.c 1.34 -> 1.35
syssrc/sys/netinet/ip_nat.h 1.20 -> 1.21
syssrc/sys/netinet/ip_rcmd_pxy.c 1.4 -> 1.5
syssrc/sys/netinet/ip_state.c 1.26 -> 1.27
syssrc/sys/netinet/ip_state.h 1.16 -> 1.17
syssrc/sys/netinet/ipl.h 1.8 -> 1.9

Changes:
>3.4.9 08/08/2000 - Released
>
>implement new aging mechanism in fr_tcp_age()
>
>fix icmp state checking bug
>
>revamp buildsunos script and build both sparcv7/sparcv9 for Solaris
>if on an Ultra with a 64bit system & compiler (Caseper Dik)
>
>open ipfilter device read only if we know we can
>
>print out better information for ICMP packets in ipmon
>
>move checking for source spoofed packets to a point where we can generate
>logs of them
>
>return EFAULT from ircopyptr/iwcopyptr
>
>don't do ioctl(SIOCGETFS) for auth stats
>
>fix up freeing mbufs for post-4.3BSD
>
>fix returning of inc from ftp proxy
>
>fix bugs with ipfs -R/-W (Caseper Dik)
>
>3.4.8 19/07/2000 - Released
>
>create fake opt_inet6.h for FreeBSD-4 compile as LKM
>
>add #ifdef's for KLD_MODULE sanity
>
>NAT fastroute'd packets which come out of return-*
>
>fix upper/lower case crap in ftp proxy and get seq# checking fixed up.
>
>3.4.7 08/07/2000 - Released
>
>make "ipf -y" lookup NAT if's which are unknown
>
>prepend line numbers to ioctl error messages in ipf/ipnat
>
>don't apply patches to FreeBSD twice
>
>allow for ip_len to be on an unaligned boundary early on in fr_precheck
>
>fix printing of icmp code when it is 0
>
>correct printing of port numbers in map rules with from/to
>
>don't allow fr_func to be called at securelevel > 0 or rules to be added
>if securelevel > 0 if they have a non-zero fr_func.
 1.5.4.3 28-Feb-2002  nathanw Catch up to -current.
 1.5.4.2 14-Nov-2001  nathanw Catch up to -current.
 1.5.4.1 09-Apr-2001  nathanw Catch up with -current.
 1.5.2.3 27-Mar-2001  bouyer Sync with HEAD.
 1.5.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.5.2.1 09-Aug-2000  bouyer file ip_rcmd_pxy.c was added on branch thorpej_scsipi on 2000-11-20 18:10:33 +0000
 1.6.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.6.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.9.16.2 19-Oct-2004  skrll Sync with HEAD
 1.9.16.1 03-Aug-2004  skrll Sync with HEAD
 1.10.2.1 13-Aug-2004  jmc branches: 1.10.2.1.2;
Pullup rev 1.11 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.10.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.12 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.23 31-May-2022  andvar fix various typos in comments, documentation and messages.
 1.22 16-Feb-2022  andvar fix various typos, mainly in comments.
 1.21 12-Oct-2018  maxv Force ip_off to zero when the reassembly is complete. This was lost in my
rev1.19 - before that the IP struct was clobbered for the reassembly, but
it actually implicitly guaranteed that the first fragment of the packet
would end up with ip_off = 0, and this was a desired behavior.
 1.20 17-Sep-2018  maxv Kick fragments that would introduce several !MFFs in a reassembly chain.

The problem arises if we receive three fragments of the kind

3. A -> has MFF
1. B -> doesn't have MFF
2. C -> doesn't have MFF

Because of the received order B->C->A, we don't see that B is !MFF, and
therefore that there is a problem in this chain.

Now we do two checks, and drop us if:

* there is a fragment preceding us, and this fragment is !MFF, or
* there is a fragment following us, and we are !MFF

Spotted a long time ago.
 1.19 17-Sep-2018  maxv Hold ip_off and ip_len in the fragment entry, instead of always reading
the associated mbuf (and converting to host order). This reduces the
cache/TLB misses when processing long lists.
 1.18 10-Jul-2018  maxv Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.
 1.17 15-May-2018  maxv branches: 1.17.2;
When reassembling IPv4/IPv6 packets, ensure each fragment has been subject
to the same IPsec processing. That is to say, that all fragments are ESP,
or AH, or AH+ESP, or none.

The reassembly mechanism can be used both on the wire and inside an IPsec
tunnel, so we need to make sure all fragments of a packet were received
on only one side.

Even though I haven't tried, I believe there are configurations where it
would be possible for an attacker to inject an unencrypted fragment into a
legitimate stream of already-decrypted-and-authenticated fragments.

Typically on IPsec gateways with ESP tunnels, where we can encapsulate
fragments (as opposed to the general case, where we fragment encapsulated
data).

Note, for the record: a funnier thing, under IPv4, would be to send a
zero-sized !MFF fragment at the head of the packet, and manage to trigger
an ICMP error; M_DECRYPTED gets lost by the reassembly, and ICMP will reply
with the packet in clear (not encrypted).
 1.16 03-May-2018  maxv Rename m_pkthdr_remove -> m_remove_pkthdr, to match the existing naming
convention, eg m_copy_pkthdr and m_move_pkthdr.
 1.15 11-Apr-2018  maxv Add 'static', like the prototype.
 1.14 09-Mar-2018  maxv Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:

m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);

m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.13 08-Feb-2018  maxv branches: 1.13.2;
Change the error stat from IP_STAT_BADFRAGS to IP_STAT_TOOLONG. The
ping_of_death ATF test expects this counter to get increased.
 1.12 06-Feb-2018  maxv Add one more check in ip_reass_packet(): make sure that the end of each
fragment does not exceed IP_MAXPACKET.

In ip_reass(), we only check the final length of the reassembled packet
against IP_MAXPACKET.

But there is an integer overflow that can happen a little earlier. We
are doing:

i = ntohs(p->ipqe_ip->ip_off) + ntohs(p->ipqe_ip->ip_len) -
ntohs(ip->ip_off);
[...]
ip->ip_off = htons(ntohs(ip->ip_off) + i);

It is possible that

ntohs(p->ipqe_ip->ip_off) + ntohs(p->ipqe_ip->ip_len) > 65535

so the computation of ip_off wraps to zero. This breaks an assumption in
the reassembler - it expects the list of fragments to be ordered by
offset, and here it's not ordered anymore. (Un)Fortunately I couldn't
turn this into anything exploitable.

With the new check, it is guaranteed that ip_off+ip_len<=65535.
 1.11 11-Jan-2017  ozaki-r branches: 1.11.8;
Get rid of unnecessary header inclusions
 1.10 26-Apr-2016  ozaki-r branches: 1.10.2;
Sweep unnecessary route.h inclusions
 1.9 25-Feb-2014  pooka branches: 1.9.4; 1.9.6; 1.9.8; 1.9.12;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.8 27-Jun-2011  enami branches: 1.8.2; 1.8.12; 1.8.16;
Don't increment ip_nfragpackets when failed to allocate fragment queue.
No one will decrement it on such case.
 1.7 05-Nov-2010  rmind branches: 1.7.6;
ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.
 1.6 07-Oct-2010  yamt make ipfr_lock IPL_VM as ip_reass_drain is called in interrupts via
the drain hook for mbuf pools.
 1.5 06-Oct-2010  enami Don't free memory still in use. Fixes nfs root problem reported
by Christoph Egger on source-changes-d.
 1.4 03-Oct-2010  rmind Re-structure IPv4 reassembly code to make it more MP-friendly and simplify
some code fragments while here. Also, use pool_cache(9) and mutex(9).

IPv4 reassembly mechanism is MP-safe now.
 1.3 25-Aug-2010  rmind Use own IPv4 reassembly queue entry structure and leave struct ipqent only
for TCP. Now both struct ipfr_qent, struct ipfr_queue and hashed fragment
queue are abstracted and no longer public.
 1.2 19-Jul-2010  rmind branches: 1.2.2; 1.2.4;
Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@
 1.1 13-Jul-2010  rmind Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@
 1.2.4.4 06-Nov-2010  uebayasi Sync with HEAD.
 1.2.4.3 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.2.4.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.2.4.1 19-Jul-2010  uebayasi file ip_reass.c was added on branch uebayasi-xip on 2010-08-17 06:47:46 +0000
 1.2.2.3 09-Oct-2010  yamt sync with head
 1.2.2.2 11-Aug-2010  yamt sync with head.
 1.2.2.1 19-Jul-2010  yamt file ip_reass.c was added on branch yamt-nfs-mp on 2010-08-11 22:54:56 +0000
 1.7.6.2 05-Mar-2011  rmind sync with head
 1.7.6.1 05-Nov-2010  rmind file ip_reass.c was added on branch rmind-uvmplock on 2011-03-05 20:55:58 +0000
 1.8.16.1 18-May-2014  rmind sync with head
 1.8.12.2 03-Dec-2017  jdolecek update from HEAD
 1.8.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.8.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.9.12.1 05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1594):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.9.8.1 05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1594):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.9.6.2 05-Feb-2017  skrll Sync with HEAD
 1.9.6.1 29-May-2016  skrll Sync with HEAD
 1.9.4.1 05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1594):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.10.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.11.8.7 17-Oct-2018  martin Pull up following revision(s) (requested by maxv in ticket #1045):

sys/netinet/ip_reass.c: revision 1.19-1.21

Hold ip_off and ip_len in the fragment entry, instead of always reading
the associated mbuf (and converting to host order). This reduces the
cache/TLB misses when processing long lists.

-

Kick fragments that would introduce several !MFFs in a reassembly chain.

The problem arises if we receive three fragments of the kind
3. A -> has MFF
1. B -> doesn't have MFF
2. C -> doesn't have MFF

Because of the received order B->C->A, we don't see that B is !MFF, and
therefore that there is a problem in this chain.

Now we do two checks, and drop us if:

* there is a fragment preceding us, and this fragment is !MFF, or
* there is a fragment following us, and we are !MFF

Spotted a long time ago.

-

Force ip_off to zero when the reassembly is complete. This was lost in my
rev1.19 - before that the IP struct was clobbered for the reassembly, but
it actually implicitly guaranteed that the first fragment of the packet
would end up with ip_off = 0, and this was a desired behavior.
 1.11.8.6 09-Oct-2018  martin Back out the following from ticket #1045 by maxv:

sys/netinet/ip_reass.c 1.19

Faster IPv4 packet reassembly - causes fallout, needs further investigation
(see PR kern/53664)
 1.11.8.5 03-Oct-2018  martin Pull up following revision(s) (requested by maxv in ticket #1045):

sys/netinet/ip_reass.c: revision 1.19

Hold ip_off and ip_len in the fragment entry, instead of always reading
the associated mbuf (and converting to host order). This reduces the
cache/TLB misses when processing long lists.
 1.11.8.4 27-Sep-2018  martin Pull up following revision(s) (requested by maxv in ticket #1041):

sys/netinet/ip_reass.c: revision 1.17 (patch)
sys/netinet6/frag6.c: revision 1.74 (patch)

When reassembling IPv4/IPv6 packets, ensure each fragment has been subject
to the same IPsec processing. That is to say, that all fragments are ESP,
or AH, or AH+ESP, or none.

The reassembly mechanism can be used both on the wire and inside an IPsec
tunnel, so we need to make sure all fragments of a packet were received
on only one side.

Even though I haven't tried, I believe there are configurations where it
would be possible for an attacker to inject an unencrypted fragment into a
legitimate stream of already-decrypted-and-authenticated fragments.

Typically on IPsec gateways with ESP tunnels, where we can encapsulate
fragments (as opposed to the general case, where we fragment encapsulated
data).

Note, for the record: a funnier thing, under IPv4, would be to send a
zero-sized !MFF fragment at the head of the packet, and manage to trigger
an ICMP error; M_DECRYPTED gets lost by the reassembly, and ICMP will reply
with the packet in clear (not encrypted).
 1.11.8.3 09-Apr-2018  martin Additionally pull up the following revision for ticket #668,
requested by ozaki-r:

sys/netinet/ip_reass.c 1.13

Change the error stat from IP_STAT_BADFRAGS to IP_STAT_TOOLONG. The
ping_of_death ATF test expects this counter to get increased.
 1.11.8.2 05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #695):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.11.8.1 30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #668):
sys/netinet/ip_reass.c: revision 1.12

Add one more check in ip_reass_packet(): make sure that the end of each
fragment does not exceed IP_MAXPACKET.

In ip_reass(), we only check the final length of the reassembled packet
against IP_MAXPACKET.

But there is an integer overflow that can happen a little earlier. We
are doing:

i = ntohs(p->ipqe_ip->ip_off) + ntohs(p->ipqe_ip->ip_len) -
ntohs(ip->ip_off);
[...]
ip->ip_off = htons(ntohs(ip->ip_off) + i);

It is possible that

ntohs(p->ipqe_ip->ip_off) + ntohs(p->ipqe_ip->ip_len) > 65535

so the computation of ip_off wraps to zero. This breaks an assumption in
the reassembler - it expects the list of fragments to be ordered by
offset, and here it's not ordered anymore. (Un)Fortunately I couldn't
turn this into anything exploitable.

With the new check, it is guaranteed that ip_off+ip_len<=65535.
 1.13.2.6 20-Oct-2018  pgoyette Sync with head
 1.13.2.5 30-Sep-2018  pgoyette Ssync with HEAD
 1.13.2.4 28-Jul-2018  pgoyette Sync with HEAD
 1.13.2.3 21-May-2018  pgoyette Sync with HEAD
 1.13.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.13.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.17.2.1 10-Jun-2019  christos Sync with HEAD
 1.2 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.2 23-Jul-2004  martti branches: 1.1.1.2.2;
Import IPFilter 4.1.3
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.2;
Import IPFilter 4.1.1
 1.1.1.2.2.6 19-Oct-2004  skrll Sync with HEAD
 1.1.1.2.2.5 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.1.2.2.4 18-Sep-2004  skrll Sync with HEAD.
 1.1.1.2.2.3 05-Aug-2004  skrll Fix merge mistakes.
 1.1.1.2.2.2 03-Aug-2004  skrll Sync with HEAD
 1.1.1.2.2.1 23-Jul-2004  skrll file ip_rpcb_pxy.c was added on branch ktrace-lwp on 2004-08-03 10:54:42 +0000
 1.1.1.1.2.1 13-Aug-2004  jmc branches: 1.1.1.1.2.1.2;
Pullup rev 1.1.1.2 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.1.1.1.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.2 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.3 23-Jul-2004  martti branches: 1.3.2;
Upgraded IPFilter to 4.1.3
 1.2 24-Apr-2004  matt Always include <sys/param.h> first!
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.2;
Import IPFilter 4.1.1
 1.1.1.1.2.2 13-Aug-2004  jmc Delete file (for pullup 759 requested by christos)
Bring up to ipf 4.1.3
 1.1.1.1.2.1 27-Apr-2004  jdc Pull up revision 1.2 (requested by matt in ticket #187)

Always include <sys/param.h> first!
 1.3.2.2 25-Aug-2004  skrll These are dead.
 1.3.2.1 25-Aug-2004  skrll Sync with HEAD.
 1.2 23-Jul-2004  martti branches: 1.2.2;
Upgraded IPFilter to 4.1.3
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.2;
Import IPFilter 4.1.1
 1.1.1.1.2.1 13-Aug-2004  jmc Delete file (for pullup 759 requested by christos)
Bring up to ipf 4.1.3
 1.2.2.2 25-Aug-2004  skrll These are dead.
 1.2.2.1 25-Aug-2004  skrll Sync with HEAD.
 1.2 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.4; 1.1.1.1.6;
Import IPFilter 4.1.1
 1.1.1.1.6.1 06-Feb-2005  jmc Pull up revision 1.2 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.1.1.1.4.5 19-Oct-2004  skrll Sync with HEAD
 1.1.1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.1.1.4.1 28-Mar-2004  skrll file ip_scan.c was added on branch ktrace-lwp on 2004-08-03 10:54:42 +0000
 1.2 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.4; 1.1.1.1.6;
Import IPFilter 4.1.1
 1.1.1.1.6.1 06-Feb-2005  jmc Pull up revision 1.2 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.1.1.1.4.5 19-Oct-2004  skrll Sync with HEAD
 1.1.1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.1.1.4.1 28-Mar-2004  skrll file ip_scan.h was added on branch ktrace-lwp on 2004-08-03 10:54:42 +0000
 1.47 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.46 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.45 10-May-2004  christos PR/24969: Arto Selonen: /usr/sbin/ipfs from ipfilter 4.1.1 does not work
patch applied.
 1.44 28-Mar-2004  martti branches: 1.44.2;
Upgraded IPFilter to 4.1.1
 1.43 24-Feb-2004  wiz parameter with two es. From Peter Postma.
 1.42 19-Sep-2002  martti branches: 1.42.6;
Resync with official IPF
 1.41 19-Sep-2002  martti Upgraded IPFilter to 3.4.29
 1.40 09-Jun-2002  itojun whitespace
 1.39 01-Jun-2002  yamt make "keep state" work for SYN without win scale option.
 1.38 02-May-2002  martti branches: 1.38.2; 1.38.4;
Fix compilation problems
 1.37 02-May-2002  martti Upgraded IPFilter to 3.4.27
 1.36 09-Apr-2002  thorpej Add missing #else
 1.35 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.34 24-Jan-2002  martti Re-sync with IPFilter
 1.33 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.32 15-Nov-2001  lukem don't need <sys/types.h> when including <sys/param.h>
 1.31 13-Nov-2001  lukem add RCSIDs
 1.30 06-Apr-2001  darrenr branches: 1.30.2;
fix fragment cache security hole
 1.29 26-Mar-2001  mike Resolve conflicts.
 1.28 05-Feb-2001  chs branches: 1.28.2;
expose the definitions of MIN() and MAX() in sys/param.h to the kernel
and use those in favor of a dozen copies scattered around the source tree.
 1.27 09-Aug-2000  veego Resolve conflicts.
 1.26 23-May-2000  veego branches: 1.26.4;
Resolve conflicts.
 1.25 21-May-2000  veego Resolve conflicts.
 1.24 03-May-2000  veego Resolve conflicts.
 1.23 30-Mar-2000  augustss Remove register declarations.
 1.22 07-Feb-2000  veego Fix from Darren Reed for the test failure of f11.
 1.21 01-Feb-2000  veego Resolve conflicts.
 1.20 29-Dec-1999  veego Fix a panic which was mentioned on the ipfilter mailing list.
Patch from Darren send to the mailing list after he released 3.3.6 and
did a bad job with using the wrong way to update the NetBSD version
of ipfilter.
 1.19 28-Dec-1999  darrenr update ipfilter code to 3.3.6
 1.18 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.17 02-Feb-1999  cjs branches: 1.17.2; 1.17.8; 1.17.14;
Remove SCCS markers and make these compile in $NetBSD$ IDs.
 1.16 22-Nov-1998  mrg merge ipf 3.2.10
 1.15 12-Jul-1998  veego Resolve conflicts from the import.
 1.14 29-May-1998  veego Fix compiler warnings: Add missing ()'s.
 1.13 29-May-1998  veego Resolve conflicts from the import of IPFilter 3.2.7.
 1.12 17-May-1998  veego Resolve conflicts
 1.11 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.10 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.9 21-Sep-1997  veego branches: 1.9.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.8 06-Jul-1997  thorpej branches: 1.8.2;
Restore original RCS IDs.
 1.7 05-Jul-1997  darrenr fix conflicts from import
 1.6 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.5 27-May-1997  thorpej Make this compile on 32-bit architectures again:
- Pull in includes to get appropriate prototypes.
 1.4 25-May-1997  darrenr fix conflicts
 1.3 29-Mar-1997  thorpej Resolve conflicts from merge.

XXX !!! XXX !!!
I noticed a few semi-serious bugs while doing this merge, one of which
has existed for a fairly long time. Some of them are addressed in this
commit (because they caused the kernel to not compile), and are annoted
by "XXX" and "--thorpej". The other one will be addressed shortly in
a future commit, and, as far as I can tell, affects all operating systems
which IP Filter supports.
 1.2 05-Jan-1997  veego Add $NetBSD$ id's and restore the orginal Id's.
 1.1 05-Jan-1997  mrg branches: 1.1.1;
initial import of darren reed's ip-filter, version 3.1.2.
 1.1.1.26 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.25 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.24 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.23 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.22 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.21 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.20 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.19 09-Aug-2000  veego Import IP Filter 3.4.9
 1.1.1.18 23-May-2000  veego Import IP Filter 3.4.4
 1.1.1.17 21-May-2000  veego Import IP Filter 3.4.3
 1.1.1.16 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.15 01-Feb-2000  veego Import IP Filter 3.3.8
 1.1.1.14 28-Dec-1999  darrenr update DARRENR branch of netinet to 3.3.6
 1.1.1.13 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.12 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.11 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.10 29-May-1998  veego Import IP Filter 3.2.7
 1.1.1.9 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.8 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.7 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.6 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.5 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.4 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.3 25-May-1997  darrenr Import version 3.2alpha7
 1.1.1.2 27-Mar-1997  darrenr Bring in entire 3.2alpha2 source tree
 1.1.1.1 27-Mar-1997  darrenr Update to version 3.2alpha2
 1.8.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.9.2.4 24-Nov-1998  cgd pull up rev(s) 1.16 from trunk (ipfilter 3.2.10). (mrg)
 1.9.2.3 22-Jul-1998  mellon Pull up 1.15 (veego)
 1.9.2.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.9.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.17.14.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.17.8.4 21-Apr-2001  bouyer Sync with HEAD
 1.17.8.3 27-Mar-2001  bouyer Sync with HEAD.
 1.17.8.2 11-Feb-2001  bouyer Sync with HEAD.
 1.17.8.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.17.2.4 14-Apr-2001  he Pull up revision 1.30 (via patch, requested by darrenr):
Fix bug related to fragment cache handling.
 1.17.2.3 25-May-2000  he Apply patch (requested by darrenr):
Do not let RST TCP segments create state.
 1.17.2.2 08-Jan-2000  he Pull up revision 1.19 (requested by darrenr):
Update IPF to version 3.3.6.
 1.17.2.1 20-Dec-1999  he Pull up revision 1.18 (requested by darrenr):
Update IPF to version 3.3.5.
 1.26.4.4 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.26.4.3 09-Feb-2002  he Pull up revisions 1.28-1.29,1.31-1.24 (via patch, requested by martti):
Updated IPFilter to 3.4.23.
 1.26.4.2 14-Apr-2001  he Pull up revision 1.30 (requested by darrenr):
Fix bug related to fragment cache handling.
 1.26.4.1 31-Aug-2000  veego Pull up ipf 3.4.9 (requested by veego). approved by releng-1-5.

basesrc/dist/ipf/HISTORY 1.8 -> 1.9
basesrc/dist/ipf/fils.c 1.9 -> 1.10
basesrc/dist/ipf/ip_sfil.c 1.5 -> 1.6
basesrc/dist/ipf/ipf.c 1.4 -> 1.5
basesrc/dist/ipf/ipmon.c 1.4 -> 1.5
basesrc/dist/ipf/ipnat.c 1.5 -> 1.6
basesrc/dist/ipf/natparse.c 1.3 -> 1.4
basesrc/dist/ipf/parse.c 1.4 -> 1.5
basesrc/dist/ipf/iplang/iplang_y.y 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.1 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.5 1.1 -> 1.2
syssrc/sys/netinet/fil.c 1.36 -> 1.37
syssrc/sys/netinet/ip_auth.c 1.17 -> 1.18
syssrc/sys/netinet/ip_fil.c 1.57 -> 1.58
syssrc/sys/netinet/ip_ftp_pxy.c 1.16 -> 1.17
syssrc/sys/netinet/ip_log.c 1.10 -> 1.11
syssrc/sys/netinet/ip_nat.c 1.34 -> 1.35
syssrc/sys/netinet/ip_nat.h 1.20 -> 1.21
syssrc/sys/netinet/ip_rcmd_pxy.c 1.4 -> 1.5
syssrc/sys/netinet/ip_state.c 1.26 -> 1.27
syssrc/sys/netinet/ip_state.h 1.16 -> 1.17
syssrc/sys/netinet/ipl.h 1.8 -> 1.9

Changes:
>3.4.9 08/08/2000 - Released
>
>implement new aging mechanism in fr_tcp_age()
>
>fix icmp state checking bug
>
>revamp buildsunos script and build both sparcv7/sparcv9 for Solaris
>if on an Ultra with a 64bit system & compiler (Caseper Dik)
>
>open ipfilter device read only if we know we can
>
>print out better information for ICMP packets in ipmon
>
>move checking for source spoofed packets to a point where we can generate
>logs of them
>
>return EFAULT from ircopyptr/iwcopyptr
>
>don't do ioctl(SIOCGETFS) for auth stats
>
>fix up freeing mbufs for post-4.3BSD
>
>fix returning of inc from ftp proxy
>
>fix bugs with ipfs -R/-W (Caseper Dik)
>
>3.4.8 19/07/2000 - Released
>
>create fake opt_inet6.h for FreeBSD-4 compile as LKM
>
>add #ifdef's for KLD_MODULE sanity
>
>NAT fastroute'd packets which come out of return-*
>
>fix upper/lower case crap in ftp proxy and get seq# checking fixed up.
>
>3.4.7 08/07/2000 - Released
>
>make "ipf -y" lookup NAT if's which are unknown
>
>prepend line numbers to ioctl error messages in ipf/ipnat
>
>don't apply patches to FreeBSD twice
>
>allow for ip_len to be on an unaligned boundary early on in fr_precheck
>
>fix printing of icmp code when it is 0
>
>correct printing of port numbers in map rules with from/to
>
>don't allow fr_func to be called at securelevel > 0 or rules to be added
>if securelevel > 0 if they have a non-zero fr_func.
 1.28.2.9 20-Sep-2002  thorpej Sync with HEAD.
 1.28.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.28.2.7 04-May-2002  thorpej Update from trunk.
 1.28.2.6 17-Apr-2002  nathanw Catch up to -current.
 1.28.2.5 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.28.2.4 28-Feb-2002  nathanw Catch up to -current.
 1.28.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.28.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.28.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.30.2.5 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.30.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.30.2.3 15-Mar-2002  jdolecek fix merge botch, it's now identical to rev 1.34
 1.30.2.2 11-Feb-2002  jdolecek Sync w/ -current.
 1.30.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.38.4.2 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.38.4.1 02-Jun-2002  tv Pull up revision 1.39 (requested by yamt in ticket #116):
make "keep state" work for SYN without win scale option.
 1.38.2.1 20-Jun-2002  gehenna catch up with -current.
 1.42.6.2 19-Oct-2004  skrll Sync with HEAD
 1.42.6.1 03-Aug-2004  skrll Sync with HEAD
 1.44.2.4 16-Mar-2005  tron Apply patch (requested by martti in ticket #1110):
Disable the oow test because it is broken. It is killing valid packets.
 1.44.2.3 08-Oct-2004  jmc branches: 1.44.2.3.2;
Pullup patch (requested by darrenr in ticket #902)

* Prevent hang when attempting to flush state entries for ipv4 when ipv6
are present or vice versa
* Fix matching of IPv6 state entries when the initial packet is a
sent to a multicast address. This includes not updating the address as
being fixed when a second (or further) such packet is seen before a reply.
* Disable code, for now, that limited how many ICMP packets could match a
state entry based on the number of real packets seen.
 1.44.2.2 13-Aug-2004  jmc Pullup rev 1.46 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.44.2.1 30-May-2004  tron Pull up revision 1.45 (requested by christos in ticket #416):
PR/24969: Arto Selonen: /usr/sbin/ipfs from ipfilter 4.1.1 does not work
patch applied.
 1.44.2.3.2.1 06-Feb-2005  jmc Pull up revision 1.47 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.26 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.25 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.24 28-Mar-2004  martti branches: 1.24.2;
Upgraded IPFilter to 4.1.1
 1.23 19-Sep-2002  martti branches: 1.23.6;
Upgraded IPFilter to 3.4.29
 1.22 02-May-2002  martti branches: 1.22.4;
Upgraded IPFilter to 3.4.27
 1.21 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.20 24-Jan-2002  martti Re-sync with IPFilter
 1.19 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.18 26-Mar-2001  mike branches: 1.18.2;
Resolve conflicts.
 1.17 09-Aug-2000  veego branches: 1.17.2;
Resolve conflicts.
 1.16 03-May-2000  veego branches: 1.16.4;
Resolve conflicts.
 1.15 01-Feb-2000  veego Resolve conflicts.
 1.14 12-Dec-1999  veego Resolve conflicts and small fixes.
 1.13 22-Nov-1998  mrg branches: 1.13.4; 1.13.10; 1.13.16;
merge ipf 3.2.10
 1.12 29-May-1998  veego Resolve conflicts from the import of IPFilter 3.2.7.
 1.11 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.10 14-Nov-1997  mrg merge ip-filter 3.2.1
 1.9 30-Oct-1997  mrg sigh. merge ipfilter 3.2 onto the trunk. merge to the branch was a mistake.
 1.8 21-Sep-1997  veego branches: 1.8.2;
Resolve conflicts from the merge of ipf 3.2beta5.
 1.7 06-Jul-1997  thorpej branches: 1.7.2;
Restore original RCS IDs.
 1.6 05-Jul-1997  darrenr fix conflicts from import
 1.5 28-May-1997  thorpej Resolve conflicts from merge of 3.2a7, take 2. Also, eliminate some
silly differences between the NetBSD copy of the code and the
vendor branch, keeping only those which are necessary. Of those
differences that currently exist, several "portability to NetBSD"
issues, which will be fed back to the ipfilter author.
 1.4 25-May-1997  darrenr fix conflicts
 1.3 29-Mar-1997  thorpej Resolve conflicts from merge.

XXX !!! XXX !!!
I noticed a few semi-serious bugs while doing this merge, one of which
has existed for a fairly long time. Some of them are addressed in this
commit (because they caused the kernel to not compile), and are annoted
by "XXX" and "--thorpej". The other one will be addressed shortly in
a future commit, and, as far as I can tell, affects all operating systems
which IP Filter supports.
 1.2 05-Jan-1997  veego Add $NetBSD$ id's and restore the orginal Id's.
 1.1 05-Jan-1997  mrg branches: 1.1.1;
initial import of darren reed's ip-filter, version 3.1.2.
 1.1.1.23 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.22 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.21 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.20 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.19 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.18 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.17 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.16 09-Aug-2000  veego Import IP Filter 3.4.9
 1.1.1.15 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.14 01-Feb-2000  veego Import IP Filter 3.3.8
 1.1.1.13 12-Dec-1999  veego Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.12 22-Nov-1998  mrg ip filter version 3.2.10
 1.1.1.11 12-Jul-1998  veego Import IP Filter 3.2.9
 1.1.1.10 29-May-1998  veego Import IP Filter 3.2.7
 1.1.1.9 17-May-1998  veego Import IP Filter 3.2.5
 1.1.1.8 14-Nov-1997  mrg import ip-filter 3.2.1
 1.1.1.7 30-Oct-1997  mrg import ip-filter 3.2
 1.1.1.6 21-Sep-1997  veego Import ip-filter 3.2beta5
 1.1.1.5 05-Jul-1997  darrenr import 3.2beta1 IP Filter sources
 1.1.1.4 27-May-1997  thorpej ipfilter2netbsd did not produce correct output for last import of
3.2a7. Re-import it now that ipfilter2netbsd is fixed.
 1.1.1.3 25-May-1997  darrenr Import version 3.2alpha7
 1.1.1.2 27-Mar-1997  darrenr Bring in entire 3.2alpha2 source tree
 1.1.1.1 27-Mar-1997  darrenr Update to version 3.2alpha2
 1.7.2.1 22-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.8.2.4 24-Nov-1998  cgd pull up rev(s) 1.13 from trunk (ipfilter 3.2.10). (mrg)
 1.8.2.3 22-Jul-1998  mellon Pull up 1.12 (veego)
 1.8.2.2 17-Nov-1997  mrg pull up from trunk: ipfilter 3.2.1 (plus bug fix for fil.c from marc boucher).
 1.8.2.1 30-Oct-1997  mrg merge ipfilter 3.2
 1.13.16.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.13.10.2 27-Mar-2001  bouyer Sync with HEAD.
 1.13.10.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.13.4.1 20-Dec-1999  he Pull up revision 1.14 (requested by darrenr):
Update IPF to version 3.3.5.
 1.16.4.3 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.16.4.2 09-Feb-2002  he Pull up revisions 1.18-1.20 (requested by martti):
Updated IPFilter to 3.4.23
 1.16.4.1 31-Aug-2000  veego Pull up ipf 3.4.9 (requested by veego). approved by releng-1-5.

basesrc/dist/ipf/HISTORY 1.8 -> 1.9
basesrc/dist/ipf/fils.c 1.9 -> 1.10
basesrc/dist/ipf/ip_sfil.c 1.5 -> 1.6
basesrc/dist/ipf/ipf.c 1.4 -> 1.5
basesrc/dist/ipf/ipmon.c 1.4 -> 1.5
basesrc/dist/ipf/ipnat.c 1.5 -> 1.6
basesrc/dist/ipf/natparse.c 1.3 -> 1.4
basesrc/dist/ipf/parse.c 1.4 -> 1.5
basesrc/dist/ipf/iplang/iplang_y.y 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.1 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.5 1.1 -> 1.2
syssrc/sys/netinet/fil.c 1.36 -> 1.37
syssrc/sys/netinet/ip_auth.c 1.17 -> 1.18
syssrc/sys/netinet/ip_fil.c 1.57 -> 1.58
syssrc/sys/netinet/ip_ftp_pxy.c 1.16 -> 1.17
syssrc/sys/netinet/ip_log.c 1.10 -> 1.11
syssrc/sys/netinet/ip_nat.c 1.34 -> 1.35
syssrc/sys/netinet/ip_nat.h 1.20 -> 1.21
syssrc/sys/netinet/ip_rcmd_pxy.c 1.4 -> 1.5
syssrc/sys/netinet/ip_state.c 1.26 -> 1.27
syssrc/sys/netinet/ip_state.h 1.16 -> 1.17
syssrc/sys/netinet/ipl.h 1.8 -> 1.9

Changes:
>3.4.9 08/08/2000 - Released
>
>implement new aging mechanism in fr_tcp_age()
>
>fix icmp state checking bug
>
>revamp buildsunos script and build both sparcv7/sparcv9 for Solaris
>if on an Ultra with a 64bit system & compiler (Caseper Dik)
>
>open ipfilter device read only if we know we can
>
>print out better information for ICMP packets in ipmon
>
>move checking for source spoofed packets to a point where we can generate
>logs of them
>
>return EFAULT from ircopyptr/iwcopyptr
>
>don't do ioctl(SIOCGETFS) for auth stats
>
>fix up freeing mbufs for post-4.3BSD
>
>fix returning of inc from ftp proxy
>
>fix bugs with ipfs -R/-W (Caseper Dik)
>
>3.4.8 19/07/2000 - Released
>
>create fake opt_inet6.h for FreeBSD-4 compile as LKM
>
>add #ifdef's for KLD_MODULE sanity
>
>NAT fastroute'd packets which come out of return-*
>
>fix upper/lower case crap in ftp proxy and get seq# checking fixed up.
>
>3.4.7 08/07/2000 - Released
>
>make "ipf -y" lookup NAT if's which are unknown
>
>prepend line numbers to ioctl error messages in ipf/ipnat
>
>don't apply patches to FreeBSD twice
>
>allow for ip_len to be on an unaligned boundary early on in fr_precheck
>
>fix printing of icmp code when it is 0
>
>correct printing of port numbers in map rules with from/to
>
>don't allow fr_func to be called at securelevel > 0 or rules to be added
>if securelevel > 0 if they have a non-zero fr_func.
 1.17.2.5 20-Sep-2002  thorpej Sync with HEAD.
 1.17.2.4 04-May-2002  thorpej Update from trunk.
 1.17.2.3 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.17.2.2 28-Feb-2002  nathanw Catch up to -current.
 1.17.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.18.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.18.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.18.2.1 11-Feb-2002  jdolecek Sync w/ -current.
 1.22.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.23.6.2 19-Oct-2004  skrll Sync with HEAD
 1.23.6.1 03-Aug-2004  skrll Sync with HEAD
 1.24.2.1 13-Aug-2004  jmc branches: 1.24.2.1.2;
Pullup rev 1.25 (requested by christos in ticket #759)

Bring up to ipf 4.1.3
 1.24.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.26 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.2 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.4; 1.1.1.1.6;
Import IPFilter 4.1.1
 1.1.1.1.6.1 06-Feb-2005  jmc Pull up revision 1.2 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.1.1.1.4.5 19-Oct-2004  skrll Sync with HEAD
 1.1.1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.1.1.4.1 28-Mar-2004  skrll file ip_sync.c was added on branch ktrace-lwp on 2004-08-03 10:54:43 +0000
 1.2 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.1 28-Mar-2004  martti branches: 1.1.1;
Initial revision
 1.1.1.1 28-Mar-2004  martti branches: 1.1.1.1.4; 1.1.1.1.6;
Import IPFilter 4.1.1
 1.1.1.1.6.1 06-Feb-2005  jmc Pull up revision 1.2 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.1.1.1.4.5 19-Oct-2004  skrll Sync with HEAD
 1.1.1.1.4.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.1.1.4.3 18-Sep-2004  skrll Sync with HEAD.
 1.1.1.1.4.2 03-Aug-2004  skrll Sync with HEAD
 1.1.1.1.4.1 28-Mar-2004  skrll file ip_sync.h was added on branch ktrace-lwp on 2004-08-03 10:54:43 +0000
 1.135 27-Jun-2025  andvar Grammar and spelling fixes, mainly in comments. A few in documentation,
logging, test description, and SCSI ASC/ASCQ assignment descriptions.
 1.134 10-Apr-2022  andvar branches: 1.134.10;
fix various typos in comments and output/log messages.
 1.133 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.132 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.131 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.130 28-Aug-2020  ozaki-r branches: 1.130.2;
inet: reduce silent packet discards
 1.129 28-Aug-2020  ozaki-r inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.
 1.128 13-May-2019  ozaki-r Count packets dropped by pfil
 1.127 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.126 10-Jul-2018  maxv Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.
 1.125 08-Apr-2018  maxv branches: 1.125.2;
Remove the ipre_mlast field and the TRAVERSE macro.

The goal was to store in ipre_mlast the last mbuf of the chain, so that
m_cat could be called on it. But it's not needed, since m_cat already
does the equivalent of TRAVERSE itself.

If it were needed, there would be a bug, since we don't call TRAVERSE on
ipre_mlast when creating a new reassembly entry.
 1.124 08-Apr-2018  maxv Remove unused field, and sync comment with reality.
 1.123 03-Apr-2018  maxv Remove unused fields and outdated comment.
 1.122 10-Jan-2018  knakahara branches: 1.122.2;
add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.121 11-Dec-2017  ryo As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.120 10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.119 31-Mar-2017  ozaki-r branches: 1.119.6;
Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)
 1.118 03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.117 16-Feb-2017  knakahara add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.116 08-Dec-2016  ozaki-r branches: 1.116.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.115 01-Aug-2016  knakahara improve fast-forward performance when the number of flows exceeds IPFLOW_MAX.

In the fast-forward case, when the number of flows exceeds IPFLOW_MAX, the
performmance degraded to about 50% compared to the case less than IPFLOW_MAX
flows. This modification suppresses the degradation to 65%. Furthermore,
the modified kernel is about the same performance as the original kernel
when the number of flows is less than IPFLOW_MAX.

The original patch is implemented by ryo@n.o. Thanks.
 1.114 21-Jun-2016  ozaki-r branches: 1.114.2;
Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.113 13-Jun-2016  knakahara make ipflow_reap() static function.
 1.112 28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.111 26-Apr-2016  ozaki-r Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.
 1.110 20-Jan-2016  riastradh Give proper prototype to ip_output.
 1.109 20-Jan-2016  riastradh Give proper prototype to rip_output.
 1.108 04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.107 11-Oct-2014  christos branches: 1.107.2;
exposet multicast option functions which are used by the v6 code now.
 1.106 05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.105 30-May-2014  rmind Use __CTASSERT() in the header.
 1.104 29-May-2014  rmind Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.
 1.103 23-May-2014  rmind Make ip_forward() static, there is no need to expose it.
 1.102 22-May-2014  rmind - Make ip_setmoptions(), ip_getmoptions() and ip_pcbopts() static.
- ip_output: eliminate 7th variadic argument; IP_RETURNMTU is flag
always used to store MTU size into struct inpcb::inp_errormtu.
- Clean up these routines: reduce #ifdefs, variable scopes, etc.
 1.101 22-May-2014  rmind - Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.100 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.99 19-Mar-2014  liamjfoy branches: 1.99.2;
Move ipflow into ip_var.h and fix confliction
 1.98 19-Mar-2014  liamjfoy Remove ipflow_prune and replace with ipflow_reap. ok rmind@
 1.97 03-May-2011  dyoung branches: 1.97.4; 1.97.14; 1.97.18;
*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
 1.96 05-Nov-2010  rmind branches: 1.96.2;
ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.
 1.95 25-Aug-2010  rmind Use own IPv4 reassembly queue entry structure and leave struct ipqent only
for TCP. Now both struct ipfr_qent, struct ipfr_queue and hashed fragment
queue are abstracted and no longer public.
 1.94 19-Jul-2010  rmind Revert previous change of making struct ipqent invisible to userland.
 1.93 19-Jul-2010  rmind Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@
 1.92 13-Jul-2010  rmind Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@
 1.91 01-Feb-2009  pooka branches: 1.91.4; 1.91.6;
Init ipflow pool dynamically instead of using a linkset.
 1.90 12-Oct-2008  plunky branches: 1.90.2;
update ip_pcbopts() to use sockopt(9) API.

cleans up function and one small fix is that we now stop copying user
options to the mbuf when the _EOL is given, previously this function
would continue to copy options.
 1.89 16-Aug-2008  plunky constify sockopt in the PRCO_SETOPT path
 1.88 06-Aug-2008  plunky Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.87 12-Apr-2008  thorpej branches: 1.87.4; 1.87.6; 1.87.10;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.86 09-Apr-2008  thorpej - ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).
 1.85 07-Apr-2008  thorpej Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.
 1.84 06-Feb-2008  matt branches: 1.84.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.
 1.83 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.82 22-Dec-2007  matt Make sure ip_newid etal doesn't return an ip_id of 0.
 1.81 22-Dec-2007  matt Add ipq_tos to struct ipqe. (Doesn't increase size since the last member
was a u_int16_t).
 1.80 02-Oct-2007  dyoung branches: 1.80.4; 1.80.6; 1.80.10;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.
 1.79 25-Mar-2007  liamjfoy branches: 1.79.8; 1.79.10; 1.79.12;
Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.
 1.78 17-Feb-2007  dyoung branches: 1.78.4; 1.78.6; 1.78.8;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.77 16-Feb-2006  perry branches: 1.77.20;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.76 24-Dec-2005  perry branches: 1.76.2; 1.76.4; 1.76.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.75 11-Dec-2005  christos merge ktrace-lwp.
 1.74 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.73 22-Nov-2005  yamt revert rev.1.72 as it isn't necessary.
 1.72 06-May-2005  matt branches: 1.72.2; 1.72.8;
Add #include <sys/protosw.h> when _KERNEL
 1.71 29-Apr-2005  yamt move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.
 1.70 07-Apr-2005  yamt when doing TSO, avoid to use duplicated ip_id heavily.
XXX ip_randomid
 1.69 15-Dec-2004  thorpej branches: 1.69.2; 1.69.8;
Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.68 22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.67 21-Apr-2004  itojun no space between function name and paren: foo (blah) -> foo(blah)
 1.66 18-Apr-2004  matt De __P()
 1.65 12-Dec-2003  scw Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.
 1.64 08-Dec-2003  jonathan Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.
 1.63 06-Dec-2003  jonathan Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.
 1.62 26-Nov-2003  itojun define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.
 1.61 25-Nov-2003  itojun knf
 1.60 17-Nov-2003  jonathan Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
 1.59 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.58 19-Aug-2003  itojun make ip_fragment public (it is for coming PF integration)
 1.57 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.56 29-Jun-2003  fvdl branches: 1.56.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.55 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.54 23-Jun-2003  martin Protect opt_*.h includes by _KERNEL_OPT
 1.53 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.52 15-Jun-2003  matt Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.51 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.50 28-Jan-2003  wiz success, not sucess. Noted by mjl.
 1.49 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.48 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.47 07-May-2002  matt branches: 1.47.2;
Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.
 1.46 21-Dec-2001  itojun have rip_ctlinput to notify routing changes to raw sockets
(protosw change to be done). sync with kame
 1.45 02-Mar-2001  itojun branches: 1.45.2; 1.45.4;
increase ipstat.ips_badaddr if the packet fails to pass address checks.
 1.44 13-Jan-2001  itojun allow IP_MULTICAST_IF and IP_ADD/DROP_MEMBERSHIP to specify interface
by interface index. if the interface address specified is in 0.0.0.0/8
it will be considered as interface index in network byteorder.

getsockopt(IP_MULTICAST_IF) preserves old behavior if
setsockopt(IP_MULTICAST_IF) was done with interface address, and
returns interface index if setsockopt(IP_MULTICAST_IF) was done with
interface index (again using the form in 0.0.0.0/8).

Suggested by Dave Thaler, based on RIPv2 MIB spec (RFC1724 section 3.3).

http://mail-index.netbsd.org/tech-net/2001/01/13/0003.html
 1.43 17-Oct-2000  thorpej Add an IP_MTUDISC flag to the flags that can be passed to
ip_output(). This flag, if set, causes ip_output() to set
DF in the IP header if the MTU in the route is not locked.

This allows a bunch of redundant code, which I was never
really all that happy about adding in the first place, to
be eliminated.

Inspired by a similar change made by provos@openbsd.org when
he integrated NetBSD's Path MTU Discovery code into OpenBSD.
 1.42 25-Aug-2000  tron Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.
 1.41 30-Mar-2000  simonb branches: 1.41.4;
Delete redundant decl of ip_gif_ttl - it's in <netinet/in_gif.h>.
Delete redundant decl of ip_mforward() - it's in <netinet/ip_mroute.h>.
 1.40 20-Nov-1999  thorpej Add the `packed' attribute to structures which describe wire protocol data.
 1.39 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.38 06-Jul-1999  itojun branches: 1.38.2; 1.38.8;
sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.37 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.36 08-Oct-1998  thorpej branches: 1.36.8; 1.36.10;
Use the pool allocator for ipflow entries.
 1.35 08-Oct-1998  thorpej Use the pool allocator for ipqent structures.
 1.34 02-Jun-1998  thorpej In addition to the IP flow hash table, put the flows on a list. The table
is used for fast lookup, the list for traversal of all flows. Also, use
PRT timers.
 1.33 11-May-1998  thorpej Back out previous. This problem was already fixed in a different way.
 1.32 11-May-1998  matt Let usr.sbin/tcpdump build again.
 1.31 04-May-1998  matt Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.
 1.30 30-Apr-1998  thorpej Need <net/route.h>
 1.29 29-Apr-1998  matt Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).
 1.28 29-Apr-1998  matt New TCP reassembly code. The new code reduces the memory needed by
out-of-order packets and builds the infrastructure needed for sending
SACK blocks (to be added shortly).
 1.27 29-Apr-1998  kml Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.
 1.26 24-Mar-1998  kml Ensure that we take the IP option length into account when we calculate
the effective maximum send size for TCP. ip_optlen() and tcp_optlen()
should probably be inlined for efficiency.
 1.25 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.24 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.23 05-Jan-1998  lukem enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}
 1.22 18-Oct-1997  kml branches: 1.22.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc
 1.21 14-Oct-1997  thorpej Define IP_RETURNMTU. (Matt missed this part of his diff, I guess :-)
 1.20 24-Jun-1997  thorpej branches: 1.20.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.
 1.19 11-Jan-1997  thorpej Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.
 1.18 25-Oct-1996  thorpej Make length and offset fields unsigned. From Kevin M. Lahey <kml@nas.nasa.gov>
Add a counter to IP stats, to count packets which are discarded on the
grounds that they are too large.
 1.17 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.16 13-Feb-1996  christos branches: 1.16.4;
netinet prototypes
 1.15 21-Nov-1995  cgd make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.
 1.14 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.13 14-May-1995  cgd drop (and record) malformed IP fragments. Fixes pr 1030 (differently).
 1.12 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.11 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.10 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8 10-Jan-1994  mycroft Change the counters to be all the same type -- u_long.
 1.7 10-Jan-1994  mycroft Should compile now with or without `options MULTICAST'.
 1.6 09-Jan-1994  mycroft Prototype the rest.
 1.5 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.4 06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.16.4.2 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.16.4.1 10-Nov-1996  thorpej Update from trunk:
- Make ip_len and ip_off unsigned.
- Make sure we don't accept or transmit packets larger than the
maximim IP packet size.
This fixes the so-called `death ping' bug.

Sum of work from Bill Fenner <fenner@parc.xerox.com>,
Kevin Lahey <kml@nas.nasa.gov>, and myself.

Thanks to Curt Sampson, Jukka Marin, and Kevin Lahey for testing
this under NetBSD 1.2
 1.20.4.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.22.2.1 09-May-1998  mycroft Pull up patch from kml.
 1.36.10.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.36.10.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.36.10.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.36.8.2 02-Aug-1999  thorpej Update from trunk.
 1.36.8.1 01-Jul-1999  thorpej Sync w/ -current.
 1.38.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.38.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.38.2.2 18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.38.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.41.4.1 26-Aug-2000  tron Pull up from current (approved by thorpej):

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.

syssrc/sys/netinet/in.h 1.49 -> 1.50
syssrc/sys/netinet/in_pcb.c 1.66 -> 1.67
syssrc/sys/netinet/ip_input.c 1.116 -> 1.117
syssrc/sys/netinet/ip_var.h 1.41 -> 1.42
 1.45.4.4 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.45.4.3 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.45.4.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.45.4.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.45.2.4 17-Sep-2002  nathanw Catch up to -current.
 1.45.2.3 01-Aug-2002  nathanw Catch up to -current.
 1.45.2.2 20-Jun-2002  nathanw Catch up to -current.
 1.45.2.1 08-Jan-2002  nathanw Catch up to -current.
 1.47.2.1 15-Jul-2002  gehenna catch up with -current.
 1.56.2.7 11-Dec-2005  christos Sync with head.
 1.56.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.56.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.56.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.56.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.56.2.2 03-Aug-2004  skrll Sync with HEAD
 1.56.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.69.8.1 13-Apr-2005  tron Pull up revision 1.70 (requested by yamt in ticket #145):
when doing TSO, avoid to use duplicated ip_id heavily.
XXX ip_randomid
 1.69.2.1 29-Apr-2005  kent sync with -current
 1.72.8.1 29-Nov-2005  yamt sync with head.
 1.72.2.6 11-Feb-2008  yamt sync with head.
 1.72.2.5 21-Jan-2008  yamt sync with head
 1.72.2.4 27-Oct-2007  yamt sync with head.
 1.72.2.3 03-Sep-2007  yamt sync with head.
 1.72.2.2 26-Feb-2007  yamt sync with head.
 1.72.2.1 21-Jun-2006  yamt sync with head.
 1.76.6.1 22-Apr-2006  simonb Sync with head.
 1.76.4.1 09-Sep-2006  rpaulo sync with head
 1.76.2.1 18-Feb-2006  yamt sync with head.
 1.77.20.2 15-Apr-2007  yamt sync with head.
 1.77.20.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.78.8.1 29-Mar-2007  reinoud Pullup to -current
 1.78.6.1 11-Jul-2007  mjf Sync with head.
 1.78.4.2 09-Oct-2007  ad Sync with head.
 1.78.4.1 10-Apr-2007  ad Sync with head.
 1.79.12.1 06-Oct-2007  yamt sync with head.
 1.79.10.3 23-Mar-2008  matt sync with HEAD
 1.79.10.2 09-Jan-2008  matt sync with HEAD
 1.79.10.1 06-Nov-2007  matt sync with HEAD
 1.79.8.1 04-Oct-2007  joerg Sync with HEAD.
 1.80.10.1 02-Jan-2008  bouyer Sync with HEAD
 1.80.6.1 26-Dec-2007  ad Sync with head.
 1.80.4.1 18-Feb-2008  mjf Sync with HEAD.
 1.84.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.84.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.84.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.87.10.1 19-Oct-2008  haad Sync with HEAD.
 1.87.6.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.87.4.3 09-Oct-2010  yamt sync with head
 1.87.4.2 11-Aug-2010  yamt sync with head.
 1.87.4.1 04-May-2009  yamt sync with head.
 1.90.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.91.6.2 31-May-2011  rmind sync with head
 1.91.6.1 05-Mar-2011  rmind sync with head
 1.91.4.3 06-Nov-2010  uebayasi Sync with HEAD.
 1.91.4.2 22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.91.4.1 17-Aug-2010  uebayasi Sync with HEAD.
 1.96.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.97.18.3 18-May-2014  rmind sync with head
 1.97.18.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.97.18.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.97.14.2 03-Dec-2017  jdolecek update from HEAD
 1.97.14.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.97.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.99.2.1 10-Aug-2014  tls Rebase.
 1.107.2.7 28-Aug-2017  skrll Sync with HEAD
 1.107.2.6 05-Feb-2017  skrll Sync with HEAD
 1.107.2.5 05-Oct-2016  skrll Sync with HEAD
 1.107.2.4 09-Jul-2016  skrll Sync with HEAD
 1.107.2.3 29-May-2016  skrll Sync with HEAD
 1.107.2.2 19-Mar-2016  skrll Sync with HEAD
 1.107.2.1 06-Jun-2015  skrll Sync with HEAD
 1.114.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.114.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.114.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.114.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.116.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.119.6.2 11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.119.6.1 21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.122.2.4 30-Sep-2018  pgoyette Ssync with HEAD
 1.122.2.3 28-Jul-2018  pgoyette Sync with HEAD
 1.122.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.122.2.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.125.2.1 10-Jun-2019  christos Sync with HEAD
 1.130.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.134.10.1 02-Aug-2025  perseant Sync with HEAD
 1.17 01-Oct-2004  christos Move ipf to sys/dist/ipf; Note that I followed the pattern used for pf.
I think though that the files.ipfilter and Makefile glue should go to
the dist directory, not like it is done now.
 1.16 23-Jul-2004  martti Upgraded IPFilter to 4.1.3
 1.15 28-Mar-2004  martti branches: 1.15.2;
Upgraded IPFilter to 4.1.1
 1.14 19-Sep-2002  martti branches: 1.14.6;
Upgraded IPFilter to 3.4.29
 1.13 02-May-2002  martti branches: 1.13.4;
Upgraded IPFilter to 3.4.27
 1.12 14-Mar-2002  martti Upgraded IPFilter to 3.4.25
 1.11 24-Jan-2002  martti Upgraded IPFilter to 3.4.23
 1.10 26-Mar-2001  mike branches: 1.10.2;
Resolve conflicts.
 1.9 09-Aug-2000  veego branches: 1.9.2; 1.9.4;
Resolve conflicts.
 1.8 12-Jun-2000  veego branches: 1.8.2;
Resolve conflicts.
 1.7 23-May-2000  veego branches: 1.7.2;
Resolve conflicts.
 1.6 21-May-2000  veego Resolve conflicts.
 1.5 11-May-2000  veego Resolve conflicts and fix a compile error in ip_ftp_pxy.c.
 1.4 03-May-2000  veego Resolve conflicts.
 1.3 01-Feb-2000  veego Resolve conflicts.
 1.2 28-Dec-1999  darrenr update ipfilter code to 3.3.6
 1.1 12-Dec-1999  veego branches: 1.1.1;
Initial revision
 1.1.1.16 23-Jul-2004  martti Import IPFilter 4.1.3
 1.1.1.15 28-Mar-2004  martti Import IPFilter 4.1.1
 1.1.1.14 19-Sep-2002  martti Import IPFilter 3.4.29
 1.1.1.13 02-May-2002  martti Import IPFilter 3.4.27
 1.1.1.12 14-Mar-2002  martti Import IPFilter 3.4.25
 1.1.1.11 24-Jan-2002  martti Import IPFilter 3.4.23
 1.1.1.10 26-Mar-2001  mike Import IP Filter 3.4.16
 1.1.1.9 09-Aug-2000  veego Import IP Filter 3.4.9
 1.1.1.8 12-Jun-2000  veego Import IP Filter 3.4.6
 1.1.1.7 23-May-2000  veego Import IP Filter 3.4.4
 1.1.1.6 21-May-2000  veego Import IP Filter 3.4.3
 1.1.1.5 11-May-2000  veego Import IP Filter 3.4.2
 1.1.1.4 03-May-2000  veego Import IP Filter 3.4.1
 1.1.1.3 01-Feb-2000  veego Import IP Filter 3.3.8
 1.1.1.2 28-Dec-1999  darrenr update DARRENR branch of netinet to 3.3.6
 1.1.1.1 12-Dec-1999  veego branches: 1.1.1.1.2; 1.1.1.1.4;
Import a few IP Filter 3.3.5 files under sys/netinet.
 1.1.1.1.4.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.1.1.1.2.3 08-Jan-2000  he Pull up revision 1.2 (requested by darrenr):
Update IPF to version 3.3.6.
 1.1.1.1.2.2 20-Dec-1999  he Pull up revision 1.1.1.1 (new) (requested by darrenr):
Update IPF to version 3.3.5.
 1.1.1.1.2.1 12-Dec-1999  he file ipl.h was added on branch netbsd-1-4 on 1999-12-20 21:02:07 +0000
 1.7.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.8.2.3 18-Oct-2002  itojun dist/ipf/BNF 1.5
dist/ipf/HISTORY 1.12-1.14
dist/ipf/Makefile 1.6
dist/ipf/QNX_OCL.txt 1.1 (new)
dist/ipf/common.c 1.1.1.5,1.2
dist/ipf/facpri.c 1.4
dist/ipf/fils.c 1.14-1.21
dist/ipf/ip_lfil.c deleted
dist/ipf/ip_sfil.c deleted
dist/ipf/ipf.c 1.8-1.13
dist/ipf/ipf2netbsd 1.6-1.9
dist/ipf/ipfs.c 1.6-1.10
dist/ipf/ipft_ef.c 1.4-1.7
dist/ipf/ipft_hx.c 1.4-1.5
dist/ipf/ipft_pc.c 1.4-1.5
dist/ipf/ipft_sn.c 1.4-1.5
dist/ipf/ipft_td.c 1.4-1.7
dist/ipf/ipft_tx.c 1.5-1.8
dist/ipf/iplang/iplang_y.y 1.4
dist/ipf/ipmon.c 1.8-1.17
dist/ipf/ipnat.c 1.9-1.12
dist/ipf/ipsend/44arp.c 1.3
dist/ipf/ipsend/arp.c 1.3
dist/ipf/ipsend/in_var.h 1.2
dist/ipf/ipsend/ip.c 1.4-1.5
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipresend.c 1.3-1.4
dist/ipf/ipsend/ipsend.c 1.5-1.8
dist/ipf/ipsend/ipsopt.c 1.3-1.4
dist/ipf/ipsend/iptest.c 1.4-1.5
dist/ipf/ipsend/iptests.c 1.3-1.5
dist/ipf/ipsend/lsock.c 1.3
dist/ipf/ipsend/resend.c 1.4-1.5
dist/ipf/ipsend/sbpf.c 1.3
dist/ipf/ipsend/sirix.c 1.3
dist/ipf/ipsend/sock.c 1.4-1.5
dist/ipf/ipt.c 1.5-1.10
dist/ipf/kmem.c 1.5-1.10
dist/ipf/l4check/l4check.c 1.1.1.2
dist/ipf/man/ipf.4 1.8-1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipl.4 1.4
dist/ipf/man/ipmon.8 1.8-1.10
dist/ipf/man/ipnat.4 1.3
dist/ipf/man/ipnat.5 1.5-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.4-1.7
dist/ipf/natparse.c 1.6-1.10
dist/ipf/opt.c 1.4-1.5
dist/ipf/parse.c 1.11-1.13
dist/ipf/printnat.c 1.3-1.10
dist/ipf/printstate.c 1.2-1.3
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
sys/netinet/fil.c 1.53-1.58
sys/netinet/ip_auth.c 1.25-1.30
sys/netinet/ip_compat.h 1.27-1.31
sys/netinet/ip_fil.c 1.76-1.79,1.81-1.86
sys/netinet/ip_fil.h 1.43-1.49
sys/netinet/ip_frag.c 1.27-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.22-1.26
sys/netinet/ip_h323_pxy.c 1.7 (new)
sys/netinet/ip_ipsec_pxy.c 1.2
sys/netinet/ip_log.c 1.18-1.23
sys/netinet/ip_nat.c 1.45-1.54
sys/netinet/ip_nat.h 1.25-1.27
sys/netinet/ip_netbios_pxy.c 1.2-1.4
sys/netinet/ip_proxy.c 1.28-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.35-1.42
sys/netinet/ip_state.h 1.21-1.23
sys/netinet/ipl.h 1.12-1.14
usr.sbin/ipf/ipftest/Makefile 1.19

Upgrade IPFilter to 3.4.29.

regression test is omitted. (martti)
 1.8.2.2 09-Feb-2002  he Pull up revisions 1.10-1.11 (requested by martti):
Updated IPFilter to 3.4.23
 1.8.2.1 31-Aug-2000  veego Pull up ipf 3.4.9 (requested by veego). approved by releng-1-5.

basesrc/dist/ipf/HISTORY 1.8 -> 1.9
basesrc/dist/ipf/fils.c 1.9 -> 1.10
basesrc/dist/ipf/ip_sfil.c 1.5 -> 1.6
basesrc/dist/ipf/ipf.c 1.4 -> 1.5
basesrc/dist/ipf/ipmon.c 1.4 -> 1.5
basesrc/dist/ipf/ipnat.c 1.5 -> 1.6
basesrc/dist/ipf/natparse.c 1.3 -> 1.4
basesrc/dist/ipf/parse.c 1.4 -> 1.5
basesrc/dist/ipf/iplang/iplang_y.y 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.1 1.2 -> 1.3
basesrc/dist/ipf/ipsend/ipsend.5 1.1 -> 1.2
syssrc/sys/netinet/fil.c 1.36 -> 1.37
syssrc/sys/netinet/ip_auth.c 1.17 -> 1.18
syssrc/sys/netinet/ip_fil.c 1.57 -> 1.58
syssrc/sys/netinet/ip_ftp_pxy.c 1.16 -> 1.17
syssrc/sys/netinet/ip_log.c 1.10 -> 1.11
syssrc/sys/netinet/ip_nat.c 1.34 -> 1.35
syssrc/sys/netinet/ip_nat.h 1.20 -> 1.21
syssrc/sys/netinet/ip_rcmd_pxy.c 1.4 -> 1.5
syssrc/sys/netinet/ip_state.c 1.26 -> 1.27
syssrc/sys/netinet/ip_state.h 1.16 -> 1.17
syssrc/sys/netinet/ipl.h 1.8 -> 1.9

Changes:
>3.4.9 08/08/2000 - Released
>
>implement new aging mechanism in fr_tcp_age()
>
>fix icmp state checking bug
>
>revamp buildsunos script and build both sparcv7/sparcv9 for Solaris
>if on an Ultra with a 64bit system & compiler (Caseper Dik)
>
>open ipfilter device read only if we know we can
>
>print out better information for ICMP packets in ipmon
>
>move checking for source spoofed packets to a point where we can generate
>logs of them
>
>return EFAULT from ircopyptr/iwcopyptr
>
>don't do ioctl(SIOCGETFS) for auth stats
>
>fix up freeing mbufs for post-4.3BSD
>
>fix returning of inc from ftp proxy
>
>fix bugs with ipfs -R/-W (Caseper Dik)
>
>3.4.8 19/07/2000 - Released
>
>create fake opt_inet6.h for FreeBSD-4 compile as LKM
>
>add #ifdef's for KLD_MODULE sanity
>
>NAT fastroute'd packets which come out of return-*
>
>fix upper/lower case crap in ftp proxy and get seq# checking fixed up.
>
>3.4.7 08/07/2000 - Released
>
>make "ipf -y" lookup NAT if's which are unknown
>
>prepend line numbers to ioctl error messages in ipf/ipnat
>
>don't apply patches to FreeBSD twice
>
>allow for ip_len to be on an unaligned boundary early on in fr_precheck
>
>fix printing of icmp code when it is 0
>
>correct printing of port numbers in map rules with from/to
>
>don't allow fr_func to be called at securelevel > 0 or rules to be added
>if securelevel > 0 if they have a non-zero fr_func.
 1.9.4.5 20-Sep-2002  thorpej Sync with HEAD.
 1.9.4.4 04-May-2002  thorpej Update from trunk.
 1.9.4.3 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.9.4.2 28-Feb-2002  nathanw Catch up to -current.
 1.9.4.1 09-Apr-2001  nathanw Catch up with -current.
 1.9.2.3 27-Mar-2001  bouyer Sync with HEAD.
 1.9.2.2 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.9.2.1 09-Aug-2000  bouyer file ipl.h was added on branch thorpej_scsipi on 2000-11-20 18:10:34 +0000
 1.10.2.3 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.10.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.10.2.1 11-Feb-2002  jdolecek Sync w/ -current.
 1.13.4.1 24-Oct-2002  lukem Pull up upgrade to IPfilter 3.4.29 (requested by martti in ticket #905).
Affected files & revisions:

dist/ipf/HISTORY 1.14
dist/ipf/fils.c 1.17-1.21
dist/ipf/ipf.c 1.11-1.13
dist/ipf/ipfs.c 1.8-1.10
dist/ipf/ipft_ef.c 1.6-1.7
dist/ipf/ipft_td.c 1.6-1.7
dist/ipf/ipft_tx.c 1.7-1.8
dist/ipf/ipmon.c 1.12-1.17
dist/ipf/ipnat.c 1.11-1.12
dist/ipf/ipsend/ip_var.h 1.2
dist/ipf/ipsend/ipsend.c 1.8
dist/ipf/ipsend/iptests.c 1.5
dist/ipf/ipt.c 1.8-1.10
dist/ipf/kmem.c 1.8-1.10
dist/ipf/man/ipf.4 1.10
dist/ipf/man/ipf.5 1.8
dist/ipf/man/ipftest.1 1.3
dist/ipf/man/ipmon.8 1.10
dist/ipf/man/ipnat.5 1.9-1.10
dist/ipf/man/ipnat.8 1.4
dist/ipf/misc.c 1.7
dist/ipf/natparse.c 1.10
dist/ipf/parse.c 1.13
dist/ipf/printnat.c 1.8-1.10
dist/ipf/relay.c 1.5-1.6
dist/ipf/rules/example.9 1.2
etc/rc.d/ipnat 1.8
regress/sys/kern/ipf/Makefile 1.3-1.4
regress/sys/kern/ipf/dotest6 1.2
regress/sys/kern/ipf/expected/f13 1.1.1.2
regress/sys/kern/ipf/expected/i12 1.1.1.1
regress/sys/kern/ipf/expected/ni3 1.1.1.1
regress/sys/kern/ipf/expected/ni5 1.2
regress/sys/kern/ipf/input/f13 1.1.1.2
regress/sys/kern/ipf/input/ipv6.1 1.1.1.1
regress/sys/kern/ipf/input/ni3 1.1.1.1
regress/sys/kern/ipf/regress/i12 1.1.1.1
regress/sys/kern/ipf/regress/ipv6.1 1.1.1.1
regress/sys/kern/ipf/regress/ni3.ipf 1.1.1.1
regress/sys/kern/ipf/regress/ni3.nat 1.1.1.1
sys/arch/alpha/conf/ALPHA 1.169,1.171
sys/arch/amiga/conf/GENERIC 1.185-1.186
sys/arch/arc/conf/GENERIC 1.71-1.72
sys/arch/atari/conf/GENERIC.in 1.24-1.25
sys/arch/cats/conf/GENERIC 1.31-1.32
sys/arch/cobalt/conf/GENERIC 1.34-1.35
sys/arch/hp300/conf/GENERIC 1.83-1.84
sys/arch/i386/conf/CARDBUS 1.66-1.67
sys/arch/i386/conf/GENERIC 1.510,1.512
sys/arch/i386/conf/GENERIC_LAPTOP 1.58-1.59
sys/arch/i386/conf/GENERIC_PS2TINY 1.19-1.20
sys/arch/i386/conf/GENERIC_TINY 1.47-1.48
sys/arch/luna68k/conf/GENERIC 1.33-1.33
sys/arch/mac68k/conf/GENERIC 1.130-1.131
sys/arch/mac68k/conf/GENERICSBC 1.21-1.22
sys/arch/mac68k/conf/SMALLRAM 1.4-1.5
sys/arch/macppc/conf/GENERIC 1.142-1.143
sys/arch/mipsco/conf/GENERIC 1.21-1.22
sys/arch/mmeye/conf/GENERIC 1.44-1.45
sys/arch/news68k/conf/GENERIC 1.36-1.37
sys/arch/news68k/conf/GENERIC_TINY 1.18-1.19
sys/arch/newsmips/conf/GENERIC 1.50-1.51
sys/arch/ofppc/conf/GENERIC 1.56-1.57
sys/arch/pmax/conf/GENERIC 1.103-1.104
sys/arch/prep/conf/GENERIC 1.55-1.56
sys/arch/sbmips/conf/GENERIC 1.11-1.12
sys/arch/sgimips/conf/GENERIC 1.7-1.8
sys/arch/sparc/conf/GENERIC 1.138-1.139
sys/arch/sparc64/conf/GENERIC32 1.46-1.47
sys/arch/vax/conf/GENERIC 1.102-1.103
sys/arch/x68k/conf/ALL 1.55-1.56
sys/arch/x68k/conf/GENERIC 1.80-1.81
sys/lkm/netinet/if_ipl/mln_ipl.c 1.29
sys/netinet/fil.c 1.57-1.58
sys/netinet/ip_auth.c 1.29-1.30
sys/netinet/ip_compat.h 1.30-1.31
sys/netinet/ip_fil.c 1.81-1.86
sys/netinet/ip_fil.h 1.46-1.49
sys/netinet/ip_frag.c 1.33-1.34
sys/netinet/ip_frag.h 1.18
sys/netinet/ip_ftp_pxy.c 1.25-1.26
sys/netinet/ip_h323_pxy.c 1.5-1.6
sys/netinet/ip_log.c 1.22-1.23
sys/netinet/ip_nat.c 1.51-1.53
sys/netinet/ip_nat.h 1.27
sys/netinet/ip_netbios_pxy.c 1.4
sys/netinet/ip_proxy.c 1.35-1.36
sys/netinet/ip_proxy.h 1.18
sys/netinet/ip_state.c 1.41-1.42
sys/netinet/ip_state.h 1.23
sys/netinet/ipl.h 1.14
 1.14.6.2 19-Oct-2004  skrll Sync with HEAD
 1.14.6.1 03-Aug-2004  skrll Sync with HEAD
 1.15.2.1 13-Aug-2004  jmc branches: 1.15.2.1.2;
Pullup rev 1.16 (requested by christos in ticket #1727)

Sync up w. ipf 4.1.3
 1.15.2.1.2.1 06-Feb-2005  jmc Pull up revision 1.17 (requested by martti in ticket #1086)
Move ipf to sys/dist/ipf and sync w. trunk
 1.2 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 04-Sep-2004  manu branches: 1.1.2; 1.1.14;
IPv4 PIM support, based on submission from Pavlin Radoslavov on tech-net@ :
two new files I forgot to add on the first cvs commit.
 1.1.14.1 21-Jun-2006  yamt sync with head.
 1.1.2.5 11-Dec-2005  christos Sync with head.
 1.1.2.4 21-Sep-2004  skrll SYNC WITH HEAD.
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 04-Sep-2004  skrll file pim.h was added on branch ktrace-lwp on 2004-09-18 14:54:54 +0000
 1.4 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.3 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.2 10-Dec-2005  elad branches: 1.2.162; 1.2.164;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.1 04-Sep-2004  manu branches: 1.1.2; 1.1.14;
IPv4 PIM support, based on submission from Pavlin Radoslavov on tech-net@ :
two new files I forgot to add on the first cvs commit.
 1.1.14.1 21-Jun-2006  yamt sync with head.
 1.1.2.5 11-Dec-2005  christos Sync with head.
 1.1.2.4 21-Sep-2004  skrll SYNC WITH HEAD.
 1.1.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.1.2.1 04-Sep-2004  skrll file pim_var.h was added on branch ktrace-lwp on 2004-09-18 14:54:54 +0000
 1.2.164.1 10-Jun-2019  christos Sync with HEAD
 1.2.162.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.2.162.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.15 04-Nov-2022  ozaki-r inpcb: rename functions to in6pcb_*
 1.14 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.13 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.12 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.11 11-Jan-2017  ozaki-r Get rid of unnecessary header inclusions
 1.10 26-Apr-2016  ozaki-r branches: 1.10.2;
Sweep unnecessary route.h inclusions
 1.9 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.8 10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.7 02-Dec-2014  christos use the new printing code.
 1.6 08-Sep-2014  joerg branches: 1.6.2;
Always use cprng_fast32, even during initialisation. No point in using
random(9).
 1.5 01-Jun-2013  pooka branches: 1.5.2;
Give portalgo a compile-time override; for cases where the default default
doesn't make enough sense to even consider it (a lot of outgoing connections
from rump kernels with local port 65535).
 1.4 07-Dec-2012  christos use __BITMAP_TYPE
 1.3 01-Dec-2012  christos switch from fd_set to using bitmap macros
 1.2 29-Nov-2012  christos Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.
 1.1 25-Jun-2012  christos branches: 1.1.2; 1.1.4;
rename rfc6056 -> portalgo, requested by yamt
 1.1.4.4 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.1.4.3 16-Jan-2013  yamt sync with (a bit old) head
 1.1.4.2 30-Oct-2012  yamt sync with head
 1.1.4.1 25-Jun-2012  yamt file portalgo.c was added on branch yamt-pagecache on 2012-10-30 17:22:46 +0000
 1.1.2.3 03-Dec-2017  jdolecek update from HEAD
 1.1.2.2 23-Jun-2013  tls resync from head
 1.1.2.1 25-Feb-2013  tls resync with head
 1.5.2.2 23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.5.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.6.2.4 05-Feb-2017  skrll Sync with HEAD
 1.6.2.3 29-May-2016  skrll Sync with HEAD
 1.6.2.2 22-Sep-2015  skrll Sync with HEAD
 1.6.2.1 06-Apr-2015  skrll Sync with HEAD
 1.10.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.3 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.2 29-Nov-2012  christos Add a new sysctl to mark ports as reserved, so that they are not used in
the anonymous or reserved port allocation.
 1.1 25-Jun-2012  christos branches: 1.1.2; 1.1.4;
rename rfc6056 -> portalgo, requested by yamt
 1.1.4.3 16-Jan-2013  yamt sync with (a bit old) head
 1.1.4.2 30-Oct-2012  yamt sync with head
 1.1.4.1 25-Jun-2012  yamt file portalgo.h was added on branch yamt-pagecache on 2012-10-30 17:22:46 +0000
 1.1.2.1 25-Feb-2013  tls resync with head
 1.187 20-Jun-2025  roy inet: respect IP_TOS and IP_TTL for raw sockets
 1.186 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.185 29-Jun-2024  riastradh branches: 1.185.2;
netinet: Use _NET_STAT* API instead of direct array access.

PR kern/58380
 1.184 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.183 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.182 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.181 13-Jun-2022  knakahara Refactor like in_pcb.c:r1.187 and in6_pcb.c:r1.168.

Use TAILQ_FOREACH instead of TAILQ_FOREACH_SAFE about inpt_queue.
rip_pcbnotify() doesn't use "ninph" pointer and doesn't remove elements.
 1.180 08-Sep-2020  christos Add IP_BINDANY, IPV6_BINDANY which can be used to bind to any address in
order to implement transparent proxies.
 1.179 24-Feb-2019  maxv RIP, RIP6, DDP, SCTP and SCTP6 lack a length check in their _connect()
functions. Fix the first three, and add a big XXX in the SCTP ones.

Found by KASAN, triggered by SyzKaller.

Reported-by: syzbot+9eaf98dad6ca738c250d@syzkaller.appspotmail.com
 1.178 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.177 11-May-2018  maxv branches: 1.177.2;
Make sure we have at least an IP header, and remove pointless XXXs (there
is no issue).
 1.176 28-Apr-2018  maxv Remove unused ipsec_var.h includes.
 1.175 12-Apr-2018  maxv Make 'opts' local to rip_sbappendaddr().
 1.174 12-Apr-2018  maxv Synchronize the code between raw_ip6.c<->icmp6.c<->raw_ip.c, so that it is
the same everywhere.
 1.173 12-Apr-2018  maxv Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.
 1.172 21-Mar-2018  roy Sprinkle more soroverflow().
 1.171 28-Feb-2018  maxv branches: 1.171.2;
Remove unused ipsec_private.h includes.
 1.170 28-Feb-2018  maxv (just forgot to commit this file, the message was)

Remove duplicate IPSEC_STATINC(IPSEC_STAT_IN_POLVIO), ipsec_in_reject
already increases it. IPSEC6_STATINC is now unused, so remove it too.
 1.169 26-Feb-2018  maxv Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.168 14-Feb-2018  christos join lines where they fit.
 1.167 11-Dec-2017  ryo As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.166 10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.165 06-Jul-2017  christos Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.
 1.164 20-Apr-2017  ozaki-r branches: 1.164.4;
Remove unnecessary NULL checks for inp_socket and in6p_socket

They cannot be NULL except for programming errors.
 1.163 03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.162 24-Jan-2017  ozaki-r Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.161 29-Sep-2016  roy branches: 1.161.2;
Now that we disallow sending or receiving from invalid addresses,
allow binding to tentative addresses.
 1.160 26-Aug-2016  roy Allow bind to detached INET addresses.
 1.159 01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.158 12-May-2016  ozaki-r branches: 1.158.2;
Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.157 26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.156 21-Jan-2016  riastradh Revert previous: ran cvs commit when I meant cvs diff. Sorry!

Hit up-arrow one too few times.
 1.155 21-Jan-2016  riastradh Give proper prototype to ip_output.
 1.154 20-Jan-2016  riastradh Give proper prototype to rip_output.
 1.153 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.152 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.151 02-May-2015  roy Add IPv4 address flags IN_IFF_TENTATIVE, IN_IFF_DUPLICATED and
IN_IFF_DETATCHED to mimic the IPv6 address behaviour.
Add SIOCGIFAFLAG_IN ioctl to retrieve the address flag via the
ifreq structure.
Add IPv4 DAD detection via the ARP methods described in RFC 5227.
Add sysctls net.inet.ip.dad_count and net.inet.arp.debug.

Discussed on tech-net@
 1.150 26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.149 25-Apr-2015  rtr make rip_connect_pcb take sockaddr_in * instead of mbuf *
make rip_connect_pcb static since it appears to be used only in raw_ip.c

moves m_len check to callers which is a small duplication of code
that will go away when the callers are converted to receive sockaddr *.
 1.148 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.147 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.146 10-Nov-2014  maxv branches: 1.146.2;
Do not uselessly include <sys/malloc.h>.
 1.145 09-Aug-2014  rtr branches: 1.145.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.144 08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.143 05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.142 05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.141 03-Aug-2014  rtr req cannot be PRU_SENDOOB here as per KASSERT() earlier in the
rip_usrreq() function.

- KASSERT(!control || (req == PRU_SEND || req == PRU_SENDOOB));
+ KASSERT(!control || (req == PRU_SEND));
 1.140 02-Aug-2014  rtr restore splsoftnet() in various usrreqs that were removed during the PRU
splits. we will properly review removal after the PRU split work is
complete.
 1.139 31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.138 31-Jul-2014  ozaki-r Define IFNET_EMPTY() and replace !IFNET_FIRST() with it

No functional change.
 1.137 30-Jul-2014  rtr split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind
 1.136 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.135 23-Jul-2014  rtr split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind
 1.134 14-Jul-2014  rtr fix fat fingered KASSERT(solocked(0)) -> KASSERT(solocked(so)) mistake.

spotted by Takahiro HAYASHI
 1.133 09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.132 09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.131 07-Jul-2014  rtr * sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.
 1.130 07-Jul-2014  rtr backout change that made pr_stat return EOPNOTSUPP for protocols that
were not filling in struct stat.

decision made after further discussion with rmind and investigation of
how other operating systems behave. soo_stat() is doing just enough to
be able to call what gets returned valid and thus justifys a return of
success.

additional review will be done to determine of the pr_stat functions
that were already returning EOPNOTSUPP can be considered successful with
what soo_stat() is doing.
 1.129 07-Jul-2014  rtr * have pr_stat return EOPNOTSUPP consistently for all protocols that do
not fill in struct stat instead of returning success.

* in pr_stat remove all checks for non-NULL so->so_pcb except where the
pcb is actually used (i.e. cases where we don't return EOPNOTSUPP).

proposed on tech-net@
 1.128 06-Jul-2014  rtr * split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind
 1.127 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.126 23-Jun-2014  rtr where appropriate rename xxx_ioctl() struct mbuf * parameters from
`control' to `ifp' after split from xxx_usrreq().

sys_socket.c
fix wrapping of arguments to be consistent with other function calls
in the file after replacing pr_usrreq() call with pr_ioctl() which
required one less argument.

link_proto.c
fix indentation of parameters in link_ioctl() prototype to be
consistent with the rest of the file.

discussed with rmind@
 1.125 22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.124 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.123 22-May-2014  rmind - Make ip_setmoptions(), ip_getmoptions() and ip_pcbopts() static.
- ip_output: eliminate 7th variadic argument; IP_RETURNMTU is flag
always used to store MTU size into struct inpcb::inp_errormtu.
- Clean up these routines: reduce #ifdefs, variable scopes, etc.
 1.122 20-May-2014  rmind Adjust PR_WRAP_USRREQS() to include the attach/detach functions.
We still need the kernel-lock for some corner cases.
 1.121 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.120 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.119 18-May-2014  rmind Use IFNET_FIRST() rather than open coding ifnet access.
 1.118 25-Feb-2014  pooka branches: 1.118.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.117 23-Nov-2013  christos convert from CIRCLEQ to TAILQ.
 1.116 05-Jun-2013  christos branches: 1.116.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.115 05-Feb-2013  joerg Remove remnants of AF_IMPLINK.
 1.114 22-Mar-2012  drochner branches: 1.114.2;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.113 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.112 17-Jul-2011  joerg branches: 1.112.2; 1.112.6;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.111 09-Dec-2009  dyoung Remove superfluous cast of a pointer to void *.

Compare a pointer with NULL, not 0.

No functional change intended.
 1.110 16-Sep-2009  pooka Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.109 19-Jan-2009  christos Provide compatibility to the old timeval SCM_TIMESTAMP messages.
 1.108 06-Aug-2008  plunky branches: 1.108.2;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.107 24-Apr-2008  ad branches: 1.107.2; 1.107.4; 1.107.8;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.106 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.105 12-Apr-2008  thorpej branches: 1.105.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.104 07-Apr-2008  thorpej Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.
 1.103 06-Feb-2008  matt branches: 1.103.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.
 1.102 21-Dec-2007  matt Add fix for ip_id information leakage. Since the leakage information is
primarily used with TCP SYN and RST packets and such packets are less than
the smallest sized packet that an IP stack is allowed to fragment, we simply
set ip_id to 0 for all packets 68 bytes or less.
 1.101 27-Nov-2007  christos branches: 1.101.2; 1.101.6;
require that the options argument is the right size, not that it is greater
or equal to the requested size. Suggested by Matt Thomas.
 1.100 19-Sep-2007  dyoung branches: 1.100.6;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.99 02-Sep-2007  dyoung m_copym(..., 0, M_COPYALL, ...) -> m_copypacket(..., ...).
 1.98 02-Sep-2007  dyoung m_copy() was deprecated, apparently, long ago. m_copy(...) ->
m_copym(..., M_DONTWAIT).
 1.97 12-May-2007  dyoung branches: 1.97.2; 1.97.6; 1.97.8;
KNF. Use sockaddr_in_init(). Shorten staircases. No functional
changes intended.
 1.96 04-Mar-2007  christos branches: 1.96.2; 1.96.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.95 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.94 25-Oct-2006  elad branches: 1.94.4;
Introduce KAUTH_REQ_NETWORK_SOCKET_OPEN, to check if opening a socket is
allowed. It takes three int * arguments indicating domain, type, and
protocol. Replace previous KAUTH_REQ_NETWORK_SOCKET_RAWSOCK with it (but
keep it still).

Places that used to explicitly check for privileged context now don't
need it anymore, so I replaced these with XXX comment indiacting it for
future reference.

Documented and updated examples as well.
 1.93 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.92 19-Sep-2006  elad Remove ugly (void *) casts from network scope authorization wrapper and
calls to it.

While here, adapt code for system scope listeners to avoid some more
casts (forgotten in previous run).

Update documentation.
 1.91 08-Sep-2006  elad branches: 1.91.2;
First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.90 23-Jul-2006  ad branches: 1.90.4;
Use the LWP cached credentials where sane.
 1.89 14-May-2006  elad integrate kauth.
 1.88 11-Dec-2005  christos branches: 1.88.4; 1.88.6; 1.88.8; 1.88.10; 1.88.12;
merge ktrace-lwp.
 1.87 29-Apr-2005  yamt branches: 1.87.2;
move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.
 1.86 11-Mar-2005  atatat Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.
 1.85 10-Mar-2005  atatat Change types of kern.file2 and net.*.*.pcblist to NODE
 1.84 09-Mar-2005  atatat Add the following nodes to the sysctl tree:

net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist

which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
 1.83 03-Feb-2005  perry ANSIfy function declarations
 1.82 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.81 04-Sep-2004  manu branches: 1.81.4; 1.81.6;
IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
 1.80 07-May-2004  jonathan Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.79 26-Apr-2004  matt Remove #else clause of __STDC__
 1.78 19-Nov-2003  jonathan branches: 1.78.2;
Patch back support for (badly) randomized IP ids, by request:

* Include "opt_inet.h" everywhere IP-ids are generated with ip_newid(),
so the RANDOM_IP_ID option is visible. Also in ip_id(), to ensure
the prototype for ip_randomid() is made visible.

* Add new sysctl to enable randomized IP-ids, provided the kernel was
configured with RANDOM_IP_ID. (The sysctl defaults to zero, and is
a read-only zero if RANDOM_IP_ID is not configured).

Note that the implementation of randomized IP ids is still defective,
and should not be enabled at all (even if configured) without
very careful deliberation. Caveat emptor.
 1.77 17-Nov-2003  jonathan Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
 1.76 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.75 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.74 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.73 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.72 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.71 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.70 29-Jun-2003  fvdl branches: 1.70.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.69 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.68 27-May-2003  itojun can't use M_WAIT here, i believe.
 1.67 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.66 30-Jan-2003  thorpej M_SOOPTS -> MT_SOOPTS
 1.65 07-Nov-2002  thorpej In the IP_HDRINCL case of rip_output(), if the mbuf is read-only
then copy the header into a new mbuf before modifying it.

Fixes PR 18809. Thanks to Chuq Silvers for diagnosing it.
 1.64 22-Oct-2002  simonb Oops, still need the call to va_arg() to advance the args pointer.
 1.63 22-Oct-2002  simonb "off" in rip_input() is set but not used, remove it.
static global "ripsrc" is never used, remove it.
 1.62 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.61 09-Jun-2002  itojun whitespace
 1.60 21-Dec-2001  itojun branches: 1.60.8;
have rip_ctlinput to notify routing changes to raw sockets
(protosw change to be done). sync with kame
 1.59 13-Nov-2001  lukem add RCSIDs
 1.58 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.57 25-Jul-2001  itojun branches: 1.57.4;
allocate ipsec policy buffer attached to pcb in in*_pcballoc, before
giving anyone accesses to pcb (do not reveal an inconsistent ones).
sync with kame
 1.56 03-Jul-2001  itojun branches: 1.56.2;
call in{,6}_pcbpurgeif0() before in{,6}_purgeif().
 1.55 26-Feb-2001  itojun branches: 1.55.2;
make sure to validate packet against ipsec policy.
 1.54 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.53 30-Mar-2000  augustss branches: 1.53.4;
Remove register declarations.
 1.52 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.51 17-Feb-2000  darrenr Change the use of pfil hooks. There is no longer a single list of all
pfil information, instead, struct protosw now contains a structure
which caontains list heads, etc. The per-protosw pfil struct is passed
to pfil_hook_get(), along with an in/out flag to get the head of the
relevant filter list. This has been done for only IPv4 and IPv6, at
present, with these patches only enabling filtering for IPPROTO_IP and
IPPROTO_IPV6, although it is possible to have tcp/udp, etc, dedicated
filters now also. The ipfilter code has been updated to only filter
IPv4 packets - next major release of ipfilter is required for ipv6.
 1.50 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.49 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.48 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.47 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.46 13-Sep-1999  itojun branches: 1.46.2; 1.46.8;
- Call in{,6}_pcbdetach if ipsec initialization is failed during PRU_ATTACH.
This situation happens on severe memory shortage. We may need more
improvements here and there.
- Grab IEEE802 address from IFT_ETHER card, even if the card is
inserted after bootup time. Is there any other card that can be
inserted afterwards? pcmcia fddi card? :-P
- RFC2373 u bit handling suggests that we SHOULD NOT copy interface id from
ethernet card to pseudo interface, when ethernet card has IEEE802/EUI64
with u bit != 0 (this means that IEEE802/EUI64 is not universally unique).
Do not use such address as, for example, interface id for gif interface.
(I have such an ethernet card myself)
This may change interface id for your gif interface. be careful upgrading
rc files.

(sync with recent KAME)
 1.45 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.44 05-Jul-1999  darrenr Call icmp_error() at the bottom of rip_input IFF rip_input is the handler
for the protocol in the specified packet.
Fix statistic gathering to not make bogus increments of ips_delivered and
ips_noproto for cases where rip_input() is called by a protocol handler
(such as icmp_input or igmp_input) which has already processed the packet.
 1.43 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.42 30-Jan-1999  thorpej branches: 1.42.4; 1.42.6;
Make programs that use raw IP work again; trim the header length from ip_len
before handing the packet off to the socket.
 1.41 03-Apr-1998  thorpej branches: 1.41.6;
Fix a bug which would cause a panic in soreceive() if multiple raw
receivers ask for ancillary data.

Noted by Francis Dupont <Francis.Dupont@inria.fr> on tech-net.
 1.40 12-Jan-1998  scottr Use option header file for MROUTING
 1.39 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.38 16-Nov-1997  mycroft On output, if the packet length doesn't match the length in the IP header,
drop the packet with EINVAL.
 1.37 14-Oct-1997  matt branches: 1.37.2;
Add support for returning maximum supported MTU when ip_output fails with
EMSGSIZE.
 1.36 11-Jan-1997  thorpej branches: 1.36.10;
Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.
 1.35 25-Oct-1996  thorpej In rip_output(), sanity check the length of the packet to be transmitted.
If it's larger than IP_MAXPACKET, return an error condition.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>
 1.34 16-Sep-1996  mycroft Make sure the sin_zero fields are filled.
 1.33 15-Sep-1996  mycroft Hash unconnected PCBs.
 1.32 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.31 23-Jun-1996  mycroft Return ENOPROTOOPT rather than picking pseudo-random error values.
Don't allow SIOCGET{VIF,SG}CNT from sockets other than the multicast router.
Restructure rip_ctloutput() like ip_ctloutput(), and fix memory leaks.
 1.30 28-May-1996  pk Prototype new rip_*() functions.
 1.29 24-May-1996  mycroft Move some code into a separate rip_bind() function.
 1.28 23-May-1996  mycroft Make sure the control mbufs are freed in all cases.
 1.27 23-May-1996  mycroft Minor changes to make this more like other protocols. Also, fix some return
values.
 1.26 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.25 18-Feb-1996  christos branches: 1.25.4;
Fix PR/2095 options MROUTING did not compile.
 1.24 13-Feb-1996  christos netinet prototypes
 1.23 31-Jan-1996  mycroft Build a hash table of PCBs. Hash function needs tweaking.
 1.22 30-Nov-1995  pk Handle PRU_CONTROL (David Maltz; PR#1664).
 1.21 18-Jun-1995  cgd branches: 1.21.2;
convert pcb lists to CIRCLEQs, so that the end can be looked at more
easily, and so that the original (insque/remque) logic can be effectively
mimiced. (This fixes a bug in the previous set of list changes.)
also (since terminator is no longer null) reinstate uninitted list checks,
but mark them XXX.
 1.20 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.19 04-Jun-1995  mycroft Clean up many more casts.
 1.18 31-May-1995  mycroft Integrate multicast 3.5 distribution, with several bugs fixed and general
cleanup. This is a (working) snapshot of work in progress.
 1.17 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.16 02-Mar-1995  glass Fix for two bad tests in the raw IP socket input code. Only affected
raw sockets that were bound to a local address and/or connected to a
foreign address. Fix from Dan McDonald <danmcd@itd.nrl.navy.mil>
 1.15 12-Jan-1995  mycroft Fix mbuf leak in rip_ctloutput().
 1.14 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.13 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.12 10-Feb-1994  mycroft Format police.
 1.11 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.10 10-Jan-1994  mycroft Should compile now with or without `options MULTICAST'.
 1.9 09-Jan-1994  mycroft Prototype the rest.
 1.8 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.7 18-Dec-1993  mycroft Canonicalize all #includes.
 1.6 06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.5 22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.4 21-May-1993  cgd add packet size check for raw IP provided by Paul Antonov <apg@apg.kiae.su>,
to fix the "traceroute foohost 2000 == panic" problem.
 1.3 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.2 21-Mar-1993  cgd after 0.2.2 "stable" patches applied
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.21.2.1 02-Feb-1996  mycroft Bring in changes for mondo patch 2.
 1.25.4.2 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.25.4.1 10-Nov-1996  thorpej Update from trunk:
- Make ip_len and ip_off unsigned.
- Make sure we don't accept or transmit packets larger than the
maximim IP packet size.
This fixes the so-called `death ping' bug.

Sum of work from Bill Fenner <fenner@parc.xerox.com>,
Kevin Lahey <kml@nas.nasa.gov>, and myself.

Thanks to Curt Sampson, Jukka Marin, and Kevin Lahey for testing
this under NetBSD 1.2
 1.36.10.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.37.2.1 18-Nov-1997  mellon Pull rev 1.38 up from trunk (mycroft)
 1.41.6.1 11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.42.6.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.42.6.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.42.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.42.4.2 02-Aug-1999  thorpej Update from trunk.
 1.42.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.46.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.46.2.3 12-Mar-2001  bouyer Sync with HEAD.
 1.46.2.2 11-Feb-2001  bouyer Sync with HEAD.
 1.46.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.53.4.2 06-Apr-2001  he Pull up revision 1.54 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.53.4.1 26-Feb-2001  he Pull up revision 1.55 (requested by itojun):
Make sure to validate packet against ipsec policy.
 1.55.2.6 07-Nov-2002  thorpej Sync with HEAD.
 1.55.2.5 27-Aug-2002  nathanw Catch up to -current.
 1.55.2.4 20-Jun-2002  nathanw Catch up to -current.
 1.55.2.3 08-Jan-2002  nathanw Catch up to -current.
 1.55.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.55.2.1 24-Aug-2001  nathanw Catch up with -current.
 1.56.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.56.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.56.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.56.2.1 03-Aug-2001  lukem update to -current
 1.57.4.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.60.8.2 29-Aug-2002  gehenna catch up with -current.
 1.60.8.1 20-Jun-2002  gehenna catch up with -current.
 1.70.2.7 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.70.2.6 01-Apr-2005  skrll Sync with HEAD.
 1.70.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.70.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.70.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.70.2.2 03-Aug-2004  skrll Sync with HEAD
 1.70.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.78.2.1 10-May-2004  tron Pull up revision 1.80 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.81.6.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.81.6.1 12-Feb-2005  yamt sync with head.
 1.81.4.1 29-Apr-2005  kent sync with -current
 1.87.2.8 11-Feb-2008  yamt sync with head.
 1.87.2.7 21-Jan-2008  yamt sync with head
 1.87.2.6 07-Dec-2007  yamt sync with head
 1.87.2.5 27-Oct-2007  yamt sync with head.
 1.87.2.4 03-Sep-2007  yamt sync with head.
 1.87.2.3 26-Feb-2007  yamt sync with head.
 1.87.2.2 30-Dec-2006  yamt sync with head.
 1.87.2.1 21-Jun-2006  yamt sync with head.
 1.88.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.88.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.88.10.2 10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.88.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.88.8.3 14-Sep-2006  yamt sync with head.
 1.88.8.2 11-Aug-2006  yamt sync with head
 1.88.8.1 24-May-2006  yamt sync with head.
 1.88.6.1 01-Jun-2006  kardel Sync with head.
 1.88.4.2 09-Sep-2006  rpaulo sync with head
 1.88.4.1 05-Feb-2006  rpaulo inpcb_hdr is gone.
 1.90.4.1 18-Nov-2006  ad Sync with head.
 1.91.2.2 10-Dec-2006  yamt sync with head.
 1.91.2.1 22-Oct-2006  yamt sync with head
 1.94.4.3 17-May-2007  yamt sync with head.
 1.94.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.94.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.96.4.1 11-Jul-2007  mjf Sync with head.
 1.96.2.2 09-Oct-2007  ad Sync with head.
 1.96.2.1 08-Jun-2007  ad Sync with head.
 1.97.8.3 23-Mar-2008  matt sync with HEAD
 1.97.8.2 09-Jan-2008  matt sync with HEAD
 1.97.8.1 06-Nov-2007  matt sync with HEAD
 1.97.6.3 03-Dec-2007  joerg Sync with HEAD.
 1.97.6.2 02-Oct-2007  joerg Sync with HEAD.
 1.97.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.97.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.100.6.3 18-Feb-2008  mjf Sync with HEAD.
 1.100.6.2 27-Dec-2007  mjf Sync with HEAD.
 1.100.6.1 08-Dec-2007  mjf Sync with HEAD.
 1.101.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.101.2.1 26-Dec-2007  ad Sync with head.
 1.103.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.103.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.105.2.1 18-May-2008  yamt sync with head.
 1.107.8.1 19-Oct-2008  haad Sync with HEAD.
 1.107.4.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.107.2.2 11-Mar-2010  yamt sync with head
 1.107.2.1 04-May-2009  yamt sync with head.
 1.108.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.112.6.2 05-Apr-2012  mrg sync to latest -current.
 1.112.6.1 18-Feb-2012  mrg merge to -current.
 1.112.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.112.2.1 17-Apr-2012  yamt sync with head
 1.114.2.4 03-Dec-2017  jdolecek update from HEAD
 1.114.2.3 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.114.2.2 23-Jun-2013  tls resync from head
 1.114.2.1 25-Feb-2013  tls resync with head
 1.116.2.4 18-May-2014  rmind sync with head
 1.116.2.3 23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.116.2.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.116.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.118.2.1 10-Aug-2014  tls Rebase.
 1.145.2.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.146.2.8 28-Aug-2017  skrll Sync with HEAD
 1.146.2.7 05-Feb-2017  skrll Sync with HEAD
 1.146.2.6 05-Oct-2016  skrll Sync with HEAD
 1.146.2.5 29-May-2016  skrll Sync with HEAD
 1.146.2.4 19-Mar-2016  skrll Sync with HEAD
 1.146.2.3 22-Sep-2015  skrll Sync with HEAD
 1.146.2.2 06-Jun-2015  skrll Sync with HEAD
 1.146.2.1 06-Apr-2015  skrll Sync with HEAD
 1.158.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.158.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.158.2.2 04-Nov-2016  pgoyette Sync with HEAD
 1.158.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.161.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.164.4.2 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.164.4.1 21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.171.2.5 30-Sep-2018  pgoyette Ssync with HEAD
 1.171.2.4 21-May-2018  pgoyette Sync with HEAD
 1.171.2.3 02-May-2018  pgoyette Synch with HEAD
 1.171.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.171.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.177.2.1 10-Jun-2019  christos Sync with HEAD
 1.185.2.1 02-Aug-2025  perseant Sync with HEAD
 1.9 25-Jun-2012  christos rename rfc6056 -> portalgo, requested by yamt
 1.8 21-Jun-2012  yamt for the default "bsd" algorithm, restore the pre rfc6056 changes behaviour.
fix anonportmin/max.

probably other algorithms need similar fixes.
 1.7 21-Jun-2012  yamt whitespace and cosmetics. no functional changes.
 1.6 13-Apr-2012  yamt comment
 1.5 15-Mar-2012  gson Fix random kernel memory corruption by algo_doublehash(). And by
"random" I don't mean just "arbitary" as in using an uninitialized
pointer, but random as in corrupting the contents of memory addresses
chosen using a crypto-strength random number generator.

I believe this is the likely cause of multiple reports of random
crashes over the last six months, including kern/45677 and kern/46096.
 1.4 19-Nov-2011  tls branches: 1.4.2; 1.4.4;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.3 25-Sep-2011  mrg branches: 1.3.2;
make this build without INET6.
also, fix the rfc6056algo passed to sysctl_rfc6056_helper
(it was backwards for inet4/inet6.)
 1.2 24-Sep-2011  christos disable debugging
 1.1 24-Sep-2011  christos Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.3.2.2 30-Oct-2012  yamt sync with head
 1.3.2.1 17-Apr-2012  yamt sync with head
 1.4.4.1 17-Mar-2012  bouyer Pull up following revision(s) (requested by gson in ticket #122):
sys/netinet/rfc6056.c: revision 1.5
Fix random kernel memory corruption by algo_doublehash(). And by
"random" I don't mean just "arbitary" as in using an uninitialized
pointer, but random as in corrupting the contents of memory addresses
chosen using a crypto-strength random number generator.
I believe this is the likely cause of multiple reports of random
crashes over the last six months, including kern/45677 and kern/46096.
 1.4.2.2 29-Apr-2012  mrg sync to latest -current.
 1.4.2.1 05-Apr-2012  mrg sync to latest -current.
 1.4 25-Jun-2012  christos rename rfc6056 -> portalgo, requested by yamt
 1.3 22-Jun-2012  christos PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.2 24-Sep-2011  christos branches: 1.2.2;
install the header.
 1.1 24-Sep-2011  christos Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.2.2.1 30-Oct-2012  yamt sync with head
 1.7 28-Feb-2025  andvar Fix various typos in comments.
 1.6 17-Mar-2024  andvar branches: 1.6.2;
Add missing "e" in few words, in comments and one log message.
 1.5 24-Oct-2021  andvar fix various typos in comments, mainly copypasta.
 1.4 06-Sep-2021  andvar fix various typos in comments.
 1.3 03-Jun-2019  msaitoh Fix typo in comment (s/seperate/separate/).
 1.2 27-Jun-2017  rjs branches: 1.2.4; 1.2.8;
Pack structs.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.3 28-Aug-2017  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.2.8.1 10-Jun-2019  christos Sync with HEAD
 1.2.4.2 03-Dec-2017  jdolecek update from HEAD
 1.2.4.1 27-Jun-2017  jdolecek file sctp.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.6.2.1 02-Aug-2025  perseant Sync with HEAD
 1.14 11-Apr-2024  knakahara Fix invalid IPv6 route when ipsecif(4) is deleted tunnel. Pointed out by ohishi@IIJ.

The pointed bug is fixed by modification in nd6_need_cache().
Others are similar bugs.

XXX pullup-9, 10
 1.13 09-Feb-2024  andvar fix spelling mistakes, mainly in comments and log messages.
 1.12 25-Jun-2019  rjs branches: 1.12.28;
Split out the prototypes for add/delete address into a separate header file.
 1.11 28-Jun-2017  rjs branches: 1.11.4; 1.11.8;
Put back some commented out code.
 1.10 17-Jan-2017  ozaki-r Fix build w/ SCTP and w/o SCTP_DEBUG
 1.9 16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.8 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.7 15-Dec-2016  ozaki-r branches: 1.7.2;
Restore nd6.h inclusion to resolve implicit dependency
 1.6 13-Dec-2016  ozaki-r Remove unnecessary inclusions of nd6.h
 1.5 07-Jul-2016  ozaki-r branches: 1.5.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.4 07-Jul-2016  ozaki-r Use IFADDR_FOREACH instead of IFADDR_FOREACH_SAFE

No item is removed in the loop.
 1.3 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.2 03-Apr-2016  mlelstv Replace generic queue macros with IFNET/IFADDR macros.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.7 28-Aug-2017  skrll Sync with HEAD
 1.1.2.6 05-Feb-2017  skrll Sync with HEAD
 1.1.2.5 09-Jul-2016  skrll Sync with HEAD
 1.1.2.4 29-May-2016  skrll Sync with HEAD
 1.1.2.3 22-Apr-2016  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_asconf.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.5.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.5.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.7.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.11.8.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.11.4.2 03-Dec-2017  jdolecek update from HEAD
 1.11.4.1 28-Jun-2017  jdolecek file sctp_asconf.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.12.28.1 18-Apr-2024  martin Pull up following revision(s) (requested by knakahara in ticket #659):

sys/netinet6/in6_ifattach.c: revision 1.122
sys/netinet/sctp_asconf.c: revision 1.14
sys/netinet6/nd6.c: revision 1.282

Fix invalid IPv6 route when ipsecif(4) is deleted tunnel. Pointed out by ohishi@IIJ.
The pointed bug is fixed by modification in nd6_need_cache().
Others are similar bugs.
 1.4 25-Jun-2019  rjs Split out the prototypes for add/delete address into a separate header file.
 1.3 08-Jun-2019  rjs Don't need 'extern' for function prototypes.
 1.2 24-Feb-2019  kamil Appease GCC7 in sctp_asconf.h

Do not declare types inside function parameter list.
Add decklarations of types before these function prototypes.
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18; 1.1.22;
Add core networking support for SCTP.
 1.1.22.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1.22.1 10-Jun-2019  christos Sync with HEAD
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_asconf.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_asconf.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.5 02-Feb-2024  andvar fix various typos in comments.
 1.4 18-May-2022  andvar s/yeild/yield/
 1.3 01-Jan-2022  msaitoh s/implemenation/implementation/ in comment.
 1.2 27-Dec-2019  msaitoh s/inital/initial/
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18; 1.1.22;
Add core networking support for SCTP.
 1.1.22.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_constants.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_constants.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.3 13-Aug-2019  rjs Remove unused checksum code.
 1.2 12-Aug-2016  jdolecek branches: 1.2.14; 1.2.18;
sprinkle const on sctp_crc_c[]
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.3 05-Oct-2016  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_crc32.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.2.18.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.2.14.2 03-Dec-2017  jdolecek update from HEAD
 1.2.14.1 12-Aug-2016  jdolecek file sctp_crc32.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.2 13-Aug-2019  rjs Remove unused checksum code.
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18; 1.1.22;
Add core networking support for SCTP.
 1.1.22.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_crc32.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_crc32.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18;
Add core networking support for SCTP.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_hashdriver.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_hashdriver.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18;
Add core networking support for SCTP.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_hashdriver.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_hashdriver.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.4 08-Dec-2023  andvar fix triple s typos in comments.
 1.3 05-Dec-2021  msaitoh s/convience/convenience/ in comment.
 1.2 27-Jun-2017  rjs branches: 1.2.4;
Pack structs.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.3 28-Aug-2017  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_header.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.2.4.2 03-Dec-2017  jdolecek update from HEAD
 1.2.4.1 27-Jun-2017  jdolecek file sctp_header.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.18 17-Apr-2025  andvar s/valdid/valid/ and s/valdiation/validation/ in comments.
 1.17 04-Dec-2024  andvar s/transmite/transmitte/ in comments.
 1.16 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.15 09-Feb-2024  andvar branches: 1.15.2;
s/anthing/anything/ and s/be to/too/ in comments.
 1.14 15-Jan-2024  andvar Fix few typos in comments, mainly s/argment/argument/.
 1.13 05-Apr-2023  andvar remove some double ee typos in comments.
 1.12 28-May-2022  andvar fix various typos in comments.
 1.11 24-May-2022  andvar fix various typos in comment, documentation and log messages.
 1.10 08-Apr-2022  andvar s/postion/position/
 1.9 07-Apr-2022  andvar fix various typos in comments.
 1.8 22-Dec-2018  maxv Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.
 1.7 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.6 01-May-2018  maxv branches: 1.6.2;
Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.5 10-Dec-2017  rjs branches: 1.5.2;
Improve compliance to RFC 6458.
 1.4 25-Apr-2016  rjs branches: 1.4.16;
Fix build when IPSEC enabled.
 1.3 15-Feb-2016  rtr Fix building of IPv4-Mapped IPv6 addresses.

As discussed on tech-net@ use in6_sin_2_v4mapsin6() to build mapped
addresses.
 1.2 13-Dec-2015  christos branches: 1.2.2;
PR/50528: David Binderman: remove sizeof(sizeof(x))
 1.1 13-Oct-2015  rjs Add core networking support for SCTP.
 1.2.2.4 29-May-2016  skrll Sync with HEAD
 1.2.2.3 19-Mar-2016  skrll Sync with HEAD
 1.2.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.2.2.1 13-Dec-2015  skrll file sctp_indata.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.4.16.2 03-Dec-2017  jdolecek update from HEAD
 1.4.16.1 25-Apr-2016  jdolecek file sctp_indata.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.5.2.3 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.5.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.5.2.1 02-May-2018  pgoyette Synch with HEAD
 1.6.2.1 10-Jun-2019  christos Sync with HEAD
 1.15.2.1 02-Aug-2025  perseant Sync with HEAD
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18;
Add core networking support for SCTP.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_indata.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_indata.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.19 11-Jul-2025  andvar Fix various typos, mainly in comments and log/error messages.
 1.18 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.17 02-Feb-2024  andvar branches: 1.17.2;
fix various typos in comments.
 1.16 08-Apr-2022  andvar s/postion/position/
 1.15 19-Sep-2021  andvar fix various typos in comments, messages and documentation.
 1.14 28-May-2019  msaitoh s/recieve/receive/
 1.13 24-Feb-2019  kamil Add missing FALLTHROUGH in sctp_input.c

Requested by GCC NetBSD/i386 kUBSan KCOC build.
 1.12 12-Feb-2019  rjs Add some fallthrough annotations.
 1.11 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.10 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.9 01-May-2018  maxv branches: 1.9.2;
Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.8 26-Feb-2018  maxv branches: 1.8.2;
Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.7 27-Jun-2017  rjs branches: 1.7.4;
Check outgoing cookie size before accessing any contents.

Spotted in FreeBSD by maya.
 1.6 23-Jun-2017  rjs Make arguments match debug message.
 1.5 20-Apr-2017  ozaki-r Fix build of kernel with SCTP
 1.4 20-Apr-2017  ozaki-r Remove unnecessary NULL checks for inp_socket and in6p_socket

They cannot be NULL except for programming errors.
 1.3 10-Jun-2016  ozaki-r branches: 1.3.2; 1.3.4;
Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.2 25-Apr-2016  rjs Fix build when IPSEC enabled.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.5 28-Aug-2017  skrll Sync with HEAD
 1.1.2.4 09-Jul-2016  skrll Sync with HEAD
 1.1.2.3 29-May-2016  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_input.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.3.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.3.2.1 26-Apr-2017  pgoyette Sync with HEAD
 1.7.4.2 03-Dec-2017  jdolecek update from HEAD
 1.7.4.1 27-Jun-2017  jdolecek file sctp_input.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.8.2.3 30-Sep-2018  pgoyette Ssync with HEAD
 1.8.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.8.2.1 02-May-2018  pgoyette Synch with HEAD
 1.9.2.1 10-Jun-2019  christos Sync with HEAD
 1.17.2.1 02-Aug-2025  perseant Sync with HEAD
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18;
Add core networking support for SCTP.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_input.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_input.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.39 12-Jun-2025  ozaki-r sctp: follow the recent change of ip_newid()
 1.38 14-Apr-2025  andvar fix few typos in comments and update link to Printer Device Class spec.
 1.37 08-Sep-2024  rillig fix a/an grammar in obvious cases
 1.36 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.35 09-Feb-2024  andvar branches: 1.35.2;
fix spelling mistakes, mainly in comments and log messages.
 1.34 13-Sep-2023  bouyer handle EHOSTDOWN the same way as EHOSTUNREACH in sctp_med_chunk_output().
Compile-tested only (I don't have a sctp setup); proposed in
http://mail-index.netbsd.org/tech-net/2023/09/11/msg008611.html
LGTM from Greg Troxel and Robert Swindells
 1.33 04-Nov-2022  ozaki-r branches: 1.33.2;
inpcb: rename functions to in6pcb_*
 1.32 28-Oct-2022  ozaki-r Adjust dccp and sctp for struct inpcb separation
 1.31 31-May-2022  andvar fix various typos in comments, documentation and messages.
 1.30 28-May-2022  andvar fix various typos, mainly in comments.
 1.29 08-Apr-2022  andvar fix various typos, mainly in comments, but also log messages, docs, game text.
 1.28 05-Dec-2021  msaitoh s/futher/further/ in comment.
 1.27 05-Dec-2021  msaitoh s/measurment/measurement/ in comment.
 1.26 21-Oct-2021  andvar fix various typos, mainly in comments, but also in man pages and log messages.
 1.25 07-Sep-2021  andvar s/aquire/acquire/ in comments, also one typo fix acqure->acquire.
 1.24 03-Sep-2021  andvar fix typos in comments, mainly s/extention/extension/ and s/sufficent/sufficient/
 1.23 24-Jul-2021  andvar Fix all remaining typos, mainly in comments but also in few definitions and log messages, reported by me in PR kern/54889.
Also fixed some additional typos in comments, found on review of same files or typos.
 1.22 13-Jun-2020  roy branches: 1.22.6;
SCTP: Use ifp->if_mtu rather than ND_IFINFO(ifp)->linkmtu
 1.21 26-Dec-2019  msaitoh Fix typo in comment.
 1.20 03-Dec-2019  msaitoh s/upate/update/ in comment.
 1.19 13-Nov-2019  ozaki-r Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.
 1.18 22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.17 16-Sep-2018  skrll interrupt has two 'r's

fix another typo while I'm here (flsah)
 1.16 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.15 03-May-2018  maxv branches: 1.15.2;
Remove m_copy completely.
 1.14 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.13 30-Mar-2018  maya correct typo: and and -> and (comments only)

heads up on this being a common typo from chris28.
 1.12 10-Dec-2017  rjs branches: 1.12.2;
Improve compliance to RFC 6458.
 1.11 27-Jun-2017  rjs branches: 1.11.4;
Use host byte order for a debug message.
 1.10 03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.9 23-Dec-2016  maya branches: 1.9.2;
Remove extraneous parentheses. no functional change

Appeases clang
 1.8 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.7 07-Jul-2016  ozaki-r branches: 1.7.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.6 10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.5 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.4 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.3 25-Apr-2016  rjs Fix build when IPSEC enabled.
 1.2 03-Apr-2016  mlelstv Replace generic queue macros with IFNET/IFADDR macros.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.7 28-Aug-2017  skrll Sync with HEAD
 1.1.2.6 05-Feb-2017  skrll Sync with HEAD
 1.1.2.5 09-Jul-2016  skrll Sync with HEAD
 1.1.2.4 29-May-2016  skrll Sync with HEAD
 1.1.2.3 22-Apr-2016  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_output.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.7.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.7.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.9.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.11.4.2 03-Dec-2017  jdolecek update from HEAD
 1.11.4.1 27-Jun-2017  jdolecek file sctp_output.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.12.2.6 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.12.2.5 30-Sep-2018  pgoyette Ssync with HEAD
 1.12.2.4 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.12.2.3 21-May-2018  pgoyette Sync with HEAD
 1.12.2.2 02-May-2018  pgoyette Synch with HEAD
 1.12.2.1 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.15.2.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.15.2.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.15.2.1 10-Jun-2019  christos Sync with HEAD
 1.22.6.1 01-Aug-2021  thorpej Sync with HEAD.
 1.33.2.2 29-Jul-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1140):

sys/netinet/ip_output.c: revision 1.330
sys/netinet/sctp_output.c: revision 1.39
sys/netinet/ip_mroute.c: revision 1.166
sys/netipsec/ipsecif.c: revision 1.24
sys/netipsec/xform_ipip.c: revision 1.80
sys/netinet/ip_output.c: revision 1.327
sys/netinet/ip_output.c: revision 1.328
sys/netinet/ip_input.c: revision 1.406
sys/netinet/ip_output.c: revision 1.329
sys/netinet/in_var.h: revision 1.105

in: get rid of unused argument from ip_newid() and ip_newid_range()

in: take a reference of ifp on IP_ROUTETOIF
The ifp could be released after ia4_release(ia).

in: narrow the scope of ifa in ip_output (NFC)

sctp: follow the recent change of ip_newid()

in: avoid racy ifa_acquire(rt->rt_ifa) in ip_output()
If a rtentry is being destroyed asynchronously, ifa referenced by rt_ifa
can be destructed and taking ifa_acquire(rt->rt_ifa) aborts with a
KASSERT failure. Fortunately, the ifa is not actually freed because of
a reference by rt_ifa, it can be available (except some functions like
psref) so as long the rtentry is held.
PR kern/59527

in: avoid racy ia4_acquire(ifatoia(rt->rt_ifa) in ip_rtaddr()
Same as the case of ip_output(), it's racy and should be avoided.
PR kern/59527
 1.33.2.1 13-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #863):

sys/netinet/sctp_output.c: revision 1.34

handle EHOSTDOWN the same way as EHOSTUNREACH in sctp_med_chunk_output().

Compile-tested only (I don't have a sctp setup); proposed in
http://mail-index.netbsd.org/tech-net/2023/09/11/msg008611.html

LGTM from Greg Troxel and Robert Swindells
 1.35.2.1 02-Aug-2025  perseant Sync with HEAD
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18;
Add core networking support for SCTP.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_output.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_output.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.27 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.26 15-Oct-2022  andvar branches: 1.26.8;
fix various typos in documentation and comments.
mainly in words functionality, functional, function.
 1.25 28-May-2022  andvar fix various typos in comments.
 1.24 24-May-2022  andvar fix various typos in comment, documentation and log messages.
 1.23 10-Dec-2021  andvar s/occured/occurred/ in comments, log messages and man pages.
 1.22 19-Sep-2021  andvar fix various typos in comments, messages and documentation.
 1.21 30-Apr-2020  riastradh Omit needless #include <sys/rnd.h>.
 1.20 19-Jan-2020  riastradh Replace kooky sctp random number generation by cprng_strong32().
 1.19 26-Dec-2019  msaitoh branches: 1.19.2;
Fix typo in comment.
 1.18 11-Dec-2018  christos PR/53775: Havard Eidnes: bind(2) may inaccurately return EADDRNOTAVAIL,
it should return EADDRINUSE.
 1.17 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.16 27-Feb-2018  maxv branches: 1.16.2; 1.16.4;
Dedup: merge

ipsec4_get_policy and ipsec6_get_policy
ipsec4_delete_pcbpolicy and ipsec6_delete_pcbpolicy

The already-existing ipsec_get_policy() function is inlined in the new
one.
 1.15 17-Oct-2017  rjs branches: 1.15.2;
Make SCTP work when IPSEC is also defined.
 1.14 17-Oct-2017  rjs Move call to sofree() to end of sctp_inpcb_free() and re-aquire
softnet_lock.

Logic copied from in_pcb.c.
 1.13 17-Oct-2017  rjs Remove duplicate assignment, comment doesn't match it anyway.
 1.12 17-Oct-2017  rjs Remove some foreign conditional code. NFC intended.
 1.11 17-Oct-2017  rjs Wrap pcb list check with #ifdef DEBUG.
 1.10 17-Oct-2017  rjs Remove function prototype that is no longer required. NFC
 1.9 28-Jun-2017  rjs Whitespace.
 1.8 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.7 07-Jul-2016  ozaki-r branches: 1.7.2;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.6 22-May-2016  rjs Remove rtcache reference to route before freeing the containing struct.
 1.5 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.4 25-Apr-2016  rjs Fix build when IPSEC enabled.
 1.3 14-Apr-2016  rjs Remove stray debug printf().
 1.2 03-Apr-2016  mlelstv Replace generic queue macros with IFNET/IFADDR macros.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.7 28-Aug-2017  skrll Sync with HEAD
 1.1.2.6 05-Feb-2017  skrll Sync with HEAD
 1.1.2.5 09-Jul-2016  skrll Sync with HEAD
 1.1.2.4 29-May-2016  skrll Sync with HEAD
 1.1.2.3 22-Apr-2016  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_pcb.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.7.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.15.2.2 03-Dec-2017  jdolecek update from HEAD
 1.15.2.1 17-Oct-2017  jdolecek file sctp_pcb.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.16.4.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.16.4.1 10-Jun-2019  christos Sync with HEAD
 1.16.2.2 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.16.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.19.2.1 25-Jan-2020  ad Sync with head.
 1.26.8.1 02-Aug-2025  perseant Sync with HEAD
 1.8 02-Jun-2023  andvar follow the steps of Andrew Doran (ad) commit and fix more s/loose/lose/ typos.
also s/beyound/beyond/ and few others along the way, mainly in comments.
 1.7 28-Oct-2022  ozaki-r Adjust dccp and sctp for struct inpcb separation
 1.6 28-Oct-2022  ozaki-r Adjust pf, wg, dccp and sctp for struct inpcb integration
 1.5 24-May-2022  andvar fix various typos in comment, documentation and log messages.
 1.4 09-Aug-2021  andvar fix various typos in compatibility, mainly in comments.
 1.3 19-Jan-2020  riastradh Replace kooky sctp random number generation by cprng_strong32().
 1.2 08-Jun-2019  rjs branches: 1.2.4;
Don't need 'extern' for function prototypes.
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18; 1.1.22;
Add core networking support for SCTP.
 1.1.22.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.1.22.1 10-Jun-2019  christos Sync with HEAD
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_pcb.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_pcb.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.2.4.1 25-Jan-2020  ad Sync with head.
 1.2 25-Apr-2016  rjs branches: 1.2.16;
Fix build when IPSEC enabled.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.3 29-May-2016  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_peeloff.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.2.16.2 03-Dec-2017  jdolecek update from HEAD
 1.2.16.1 25-Apr-2016  jdolecek file sctp_peeloff.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18;
Add core networking support for SCTP.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_peeloff.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_peeloff.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.1 25-Jun-2019  rjs branches: 1.1.10;
Split out the prototypes for add/delete address into a separate header file.
 1.1.10.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.1.10.1 25-Jun-2019  martin file sctp_route.h was added on branch phil-wifi on 2020-04-13 08:05:16 +0000
 1.4 10-Aug-2023  andvar fix typos in comments s/iton/tion/ or s/ton/tion/.
 1.3 24-Jun-2023  msaitoh Fix typo in comment.
 1.2 25-Apr-2016  rjs branches: 1.2.16;
Fix build when IPSEC enabled.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.3 29-May-2016  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_structs.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.2.16.2 03-Dec-2017  jdolecek update from HEAD
 1.2.16.1 25-Apr-2016  jdolecek file sctp_structs.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.6 16-Feb-2022  andvar fix various typos, mainly in comments.
 1.5 01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.4 10-Dec-2017  rjs branches: 1.4.2;
Add ipsec option header.
 1.3 08-Dec-2016  ozaki-r branches: 1.3.14;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.2 25-Apr-2016  rjs branches: 1.2.2;
Fix build when IPSEC enabled.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.4 05-Feb-2017  skrll Sync with HEAD
 1.1.2.3 29-May-2016  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_timer.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.2.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.3.14.2 03-Dec-2017  jdolecek update from HEAD
 1.3.14.1 08-Dec-2016  jdolecek file sctp_timer.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.4.2.1 02-May-2018  pgoyette Synch with HEAD
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18;
Add core networking support for SCTP.
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_timer.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_timer.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.4 31-Jul-2018  rjs Change implementation of sctp_connectx() to use ioctl(2).
 1.3 10-Dec-2017  rjs branches: 1.3.2; 1.3.4;
Improve compliance to RFC 6458.
 1.2 28-Jun-2017  rjs branches: 1.2.4;
Pack assoc structs.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.3 28-Aug-2017  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_uio.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.2.4.2 03-Dec-2017  jdolecek update from HEAD
 1.2.4.1 28-Jun-2017  jdolecek file sctp_uio.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.3.4.1 10-Jun-2019  christos Sync with HEAD
 1.3.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.27 08-Sep-2024  rillig s/effect/affect/ in a few places
 1.26 08-Sep-2024  rillig fix a/an grammar in obvious cases
 1.25 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.24 09-Feb-2024  andvar branches: 1.24.2;
fix spelling mistakes, mainly in comments and log messages.
 1.23 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.22 06-Aug-2022  andvar s/blity/bility/ in various words, mainly in comments.
 1.21 29-May-2022  andvar fix various typos in comments and log messages.
 1.20 27-Apr-2020  rjs Do sctp_connectx() handling using ioctl() for IPv6 as well.
 1.19 25-Jun-2019  rjs Split out the prototypes for add/delete address into a separate header file.
 1.18 25-Feb-2019  maxv RIP6, CAN, SCTP and SCTP6 lack a length check in their _send() functions.
Fix RIP6 and CAN, add a big XXX in the SCTP ones.

Found by KASAN, triggered by SyzKaller.

Reported-by: syzbot+0b9692ae0f49f93b7dc7@syzkaller.appspotmail.com
 1.17 24-Feb-2019  maxv RIP, RIP6, DDP, SCTP and SCTP6 lack a length check in their _connect()
functions. Fix the first three, and add a big XXX in the SCTP ones.

Found by KASAN, triggered by SyzKaller.

Reported-by: syzbot+9eaf98dad6ca738c250d@syzkaller.appspotmail.com
 1.16 15-Feb-2019  rjs This really was a missing break.

Spotted by Rin Okuyama.
 1.15 12-Feb-2019  rjs Add some fallthrough annotations.
 1.14 28-Jan-2019  martin Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.13 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.12 31-Jul-2018  rjs Enable SCTP sysctl nodes.

Rename auto asconf one to match FreeBSD.
 1.11 31-Jul-2018  rjs Change implementation of sctp_connectx() to use ioctl(2).
 1.10 01-May-2018  maxv branches: 1.10.2;
Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.9 10-Dec-2017  rjs branches: 1.9.2;
Improve compliance to RFC 6458.
 1.8 17-Oct-2017  rjs branches: 1.8.2;
Make SCTP work when IPSEC is also defined.
 1.7 17-Oct-2017  rjs Set SPL level to match usage for TCP.
 1.6 07-Jul-2016  ozaki-r branches: 1.6.10;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.5 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.4 25-Apr-2016  rjs Fix build when IPSEC enabled.
 1.3 03-Apr-2016  mlelstv Replace generic queue macros with IFNET/IFADDR macros.
 1.2 13-Dec-2015  christos branches: 1.2.2;
PR/50529: David Binderman: Remove double sizeof
 1.1 13-Oct-2015  rjs Add core networking support for SCTP.
 1.2.2.5 09-Jul-2016  skrll Sync with HEAD
 1.2.2.4 29-May-2016  skrll Sync with HEAD
 1.2.2.3 22-Apr-2016  skrll Sync with HEAD
 1.2.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.2.2.1 13-Dec-2015  skrll file sctp_usrreq.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.6.10.1 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1175):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/sctp_usrreq.c 1.14
sys/netinet/tcp_usrreq.c 1.223
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/sctp6_usrreq.c 1.17
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.8.2.2 03-Dec-2017  jdolecek update from HEAD
 1.8.2.1 17-Oct-2017  jdolecek file sctp_usrreq.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.9.2.2 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.9.2.1 02-May-2018  pgoyette Synch with HEAD
 1.10.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.10.2.1 10-Jun-2019  christos Sync with HEAD
 1.24.2.1 02-Aug-2025  perseant Sync with HEAD
 1.4 27-Apr-2020  rjs Do sctp_connectx() handling using ioctl() for IPv6 as well.
 1.3 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.2 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.1 13-Oct-2015  rjs branches: 1.1.2; 1.1.18; 1.1.20; 1.1.22;
Add core networking support for SCTP.
 1.1.22.1 10-Jun-2019  christos Sync with HEAD
 1.1.20.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.1.20.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 13-Oct-2015  jdolecek file sctp_var.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctp_var.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.21 11-Jul-2025  andvar Fix various typos, mainly in comments and log/error messages.
 1.20 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.19 08-Apr-2022  andvar branches: 1.19.10;
s/postion/position/
 1.18 05-Dec-2021  msaitoh s/measurment/measurement/ in comment.
 1.17 24-Jul-2021  andvar Fix all remaining typos, mainly in comments but also in few definitions and log messages, reported by me in PR kern/54889.
Also fixed some additional typos in comments, found on review of same files or typos.
 1.16 19-Jan-2020  riastradh branches: 1.16.10;
Replace kooky sctp random number generation by cprng_strong32().
 1.15 13-Aug-2019  rjs branches: 1.15.2;
Remove unused checksum code.
 1.14 08-Nov-2018  msaitoh "s/ are are / are /" in comment. No functional change.
 1.13 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.12 16-Jan-2017  christos branches: 1.12.12; 1.12.14; 1.12.16;
ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.11 16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.10 07-Jul-2016  ozaki-r branches: 1.10.2; 1.10.4;
Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.9 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.8 22-May-2016  rjs Use const for arguments to sctp_is_same_scope().
 1.7 12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.6 26-Apr-2016  rjs Fix build when IPSEC enabled.
 1.5 11-Apr-2016  ozaki-r Sweep unncessary radix.h inclusions
 1.4 03-Apr-2016  mlelstv Replace generic queue macros with IFNET/IFADDR macros.
 1.3 06-Mar-2016  christos PR/50899: David Binderman: optimize memset
 1.2 15-Feb-2016  rtr Fix building of IPv4-Mapped IPv6 addresses.

As discussed on tech-net@ use in6_sin_2_v4mapsin6() to build mapped
addresses.
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.7 05-Feb-2017  skrll Sync with HEAD
 1.1.2.6 09-Jul-2016  skrll Sync with HEAD
 1.1.2.5 29-May-2016  skrll Sync with HEAD
 1.1.2.4 22-Apr-2016  skrll Sync with HEAD
 1.1.2.3 19-Mar-2016  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctputil.c was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.10.4.1 21-Apr-2017  bouyer Sync with HEAD
 1.10.2.1 20-Mar-2017  pgoyette Sync with HEAD
 1.12.16.3 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.12.16.2 08-Apr-2020  martin Merge changes from current as of 20200406
 1.12.16.1 10-Jun-2019  christos Sync with HEAD
 1.12.14.2 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.12.14.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.12.12.2 03-Dec-2017  jdolecek update from HEAD
 1.12.12.1 16-Jan-2017  jdolecek file sctputil.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.15.2.1 25-Jan-2020  ad Sync with head.
 1.16.10.1 01-Aug-2021  thorpej Sync with HEAD.
 1.19.10.1 02-Aug-2025  perseant Sync with HEAD
 1.5 05-Jul-2024  rin sctp_m_freem: Safely accept NULL argument as m_freem(9)
 1.4 14-Apr-2024  andvar branches: 1.4.2;
s/force_comile_error/force_compile_error/
 1.3 19-Jan-2020  riastradh Replace kooky sctp random number generation by cprng_strong32().
 1.2 22-May-2016  rjs branches: 1.2.16; 1.2.20; 1.2.26;
Use const for arguments to sctp_is_same_scope().
 1.1 13-Oct-2015  rjs branches: 1.1.2;
Add core networking support for SCTP.
 1.1.2.3 29-May-2016  skrll Sync with HEAD
 1.1.2.2 27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.1.2.1 13-Oct-2015  skrll file sctputil.h was added on branch nick-nhusb on 2015-12-27 12:10:07 +0000
 1.2.26.1 25-Jan-2020  ad Sync with head.
 1.2.20.1 08-Apr-2020  martin Merge changes from current as of 20200406
 1.2.16.2 03-Dec-2017  jdolecek update from HEAD
 1.2.16.1 22-May-2016  jdolecek file sctputil.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.4.2.1 02-Aug-2025  perseant Sync with HEAD
 1.37 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.36 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.35 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.34 01-Nov-2019  christos branches: 1.34.8;
Add comments to the tcp flags.
 1.33 10-Jan-2017  christos branches: 1.33.16;
add a couple of lint comments.
 1.32 02-Jan-2017  christos Fix TCP signature code:
1. pack options more tightly instead of being generous with no/op
2. put TCP_SIGNATURE option before SACK
3. fix computation of options length, by deferring it
XXX: Really we should move the options setting code in one place instead
of having two copies one for input and one for output.
XXX: tcp_optlen/tcp_hdrsiz need to be fixed; they were wrong before too.
 1.31 14-Feb-2015  he branches: 1.31.2;
Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).
 1.30 07-Jan-2012  christos branches: 1.30.6; 1.30.22; 1.30.24;
make standalone
 1.29 11-Dec-2011  christos u_int -> uint
 1.28 25-Dec-2007  perry branches: 1.28.44; 1.28.48;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.27 02-Aug-2007  rmind branches: 1.27.4; 1.27.10; 1.27.12; 1.27.16; 1.27.20;
TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.26 20-Jun-2007  christos branches: 1.26.2;
- per socket keepalive settings
- settable connection establishment timeout
 1.25 09-Oct-2006  rpaulo branches: 1.25.8; 1.25.10;
Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
 1.24 25-Sep-2006  rpaulo Remove line that shouldn't have been committed.
 1.23 25-Sep-2006  rpaulo PR/14806: NetBSD uses the wrong default TCP MSS.
No objections in tech-net.
 1.22 05-Sep-2006  rpaulo branches: 1.22.2; 1.22.4;
Import of TCP ECN algorithm for congestion control.
Both available for IPv4 and IPv6.
Basic implementation test results are available at
http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.

Work sponsored by the Google Summer of Code project 2006.
Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their
help, comments and support during the project.
 1.21 10-Dec-2005  elad branches: 1.21.4; 1.21.8;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.20 21-Jul-2005  riz Add a definition for TCPOLEN_SIGLEN from OpenBSD, so a kernel with
options TCP_SIGNATURE will compile again after the new PMTU checks
were brought in from OpenBSD. Approved by christos.
 1.19 07-Mar-2005  yamt branches: 1.19.4;
tcp_sack_option: the max number of sack blocks in a packet is 4, not 3.
 1.18 07-Dec-2004  yamt branches: 1.18.4; 1.18.6;
remove TCPOPT_MD5SIGNATURE because no one in our tree uses it
and it's duplicated with TCPOPT_SIGNATURE.
i preferred TCPOPT_SIGNATURE because it's used by FreeBSD and OpenBSD.
 1.17 07-May-2004  kleink Add definitions for the (currently unimplemented) ECN TCP flags;
from Chuck Swiger in PR standards/25058.
 1.16 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.15 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.14 05-May-2003  bjh21 branches: 1.14.2;
Header cleanup: Hide all of this file apart from the socket options
from POSIX/XNS applications.
 1.13 26-May-2001  matt Add TCP_MD5SIGNATURE option.
 1.12 05-Jul-2000  christos branches: 1.12.2;
added a linted comment about non-portable bitfields. Unfortunately it cannot
be fixed portably.
 1.11 20-Nov-1999  thorpej Add the `packed' attribute to structures which describe wire protocol data.
 1.10 04-Oct-1998  matt branches: 1.10.12; 1.10.18;
Adapt the NEWRENO changes from the UCSB diffs of BSDI 3.0's TCP
to NetBSD. Ignore the SACK & FACK stuff for now.
 1.9 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.8 17-Apr-1995  cgd spacing cleaup. also, minor type mixup fixups.
 1.7 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.6 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.5 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.4 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.10.18.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.10.12.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.12.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.14.2.7 11-Dec-2005  christos Sync with head.
 1.14.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.14.2.5 08-Mar-2005  skrll Sync with HEAD.
 1.14.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.14.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.14.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.14.2.1 03-Aug-2004  skrll Sync with HEAD
 1.18.6.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.18.4.1 29-Apr-2005  kent sync with -current
 1.19.4.4 21-Jan-2008  yamt sync with head
 1.19.4.3 03-Sep-2007  yamt sync with head.
 1.19.4.2 30-Dec-2006  yamt sync with head.
 1.19.4.1 21-Jun-2006  yamt sync with head.
 1.21.8.1 14-Sep-2006  yamt sync with head.
 1.21.4.1 09-Sep-2006  rpaulo sync with head
 1.22.4.1 22-Oct-2006  yamt sync with head
 1.22.2.1 18-Nov-2006  ad Sync with head.
 1.25.10.1 11-Jul-2007  mjf Sync with head.
 1.25.8.2 20-Aug-2007  ad Sync with HEAD.
 1.25.8.1 15-Jul-2007  ad Sync with head.
 1.26.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.27.20.2 02-Aug-2007  rmind TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.27.20.1 02-Aug-2007  rmind file tcp.h was added on branch matt-mips64 on 2007-08-02 02:42:41 +0000
 1.27.16.1 02-Jan-2008  bouyer Sync with HEAD
 1.27.12.1 26-Dec-2007  ad Sync with head.
 1.27.10.1 18-Feb-2008  mjf Sync with HEAD.
 1.27.4.1 09-Jan-2008  matt sync with HEAD
 1.28.48.1 18-Feb-2012  mrg merge to -current.
 1.28.44.1 17-Apr-2012  yamt sync with head
 1.30.24.2 05-Feb-2017  skrll Sync with HEAD
 1.30.24.1 06-Apr-2015  skrll Sync with HEAD
 1.30.22.1 21-Feb-2015  martin Pull up following revision(s) (requested by he in ticket #530):
sys/netinet/tcp_output.c: revision 1.180
sys/netinet/tcp_input.c: revision 1.336
sys/netinet/tcp_usrreq.c: revision 1.203
share/man/man4/tcp.4: revision 1.30
sys/netinet/tcp.h: revision 1.31
sys/netinet/tcp_subr.c: revision 1.258
sys/netinet/tcp_var.h: revision 1.176
sys/netinet/tcp_var.h: revision 1.177
sys/sys/param.h: bump revision

Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).

Change the new counter variables in struct tcpcb to uint32_t, as
per christos' comments.
 1.30.6.1 03-Dec-2017  jdolecek update from HEAD
 1.31.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.31.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.33.16.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.34.8.1 03-Apr-2021  thorpej Sync with HEAD.
 1.29 14-May-2024  andvar fix recently committed typos by msaitoh in few more places, as well as few more.
mainly s/contigous/contiguous/ and s/miliseconds/milliseconds/ in comments.
 1.28 31-Jul-2021  andvar s/threshhold/threshold
 1.27 09-Oct-2019  msaitoh branches: 1.27.12;
All of snd_wnd, snd_cwnd and snd_ssthresh in stuct tcpcb are u_long,
so use u_long and ulmin() instead of u_int and uimin(). Found by lgtm bot.

XXX TCP's sequence number is uint32_t, so it might be good to change some
entries in struct tcpcb to uint32_t instead of u_long. FreeBSD did it.
 1.26 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.25 03-May-2018  maxv branches: 1.25.2;
Remove now unused tcpip.h includes. Some were already unused before.
 1.24 29-Mar-2018  maxv Remove #ifdef INET. Same as tcp_input.c. Makes the code easier to
understand.

Also make tcp6_mtudisc() static in tcp_subr.c.
 1.23 02-Jan-2017  skrll branches: 1.23.14;
Restore behaviour to pre- tcp_congctl.c:1.18 for SACK. Further analysis
of the change is required.

OK kefren@

PR/51753 tcp SACK causes SSH disconnect
 1.22 13-Dec-2016  ozaki-r Remove unnecessary inclusions of nd6.h
 1.21 26-Apr-2016  ozaki-r branches: 1.21.2;
Sweep unnecessary route.h inclusions
 1.20 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.19 18-Nov-2013  kefren branches: 1.19.4; 1.19.6; 1.19.8; 1.19.10;
Cubic changes:
* correct W(t) calculation
* check wmax limits
* change W_max in slow and fast retransmit
* correct rtt approximation
Reno:
* move comment I forgot behind after fast_retransmit() split
 1.18 12-Nov-2013  kefren * implement TCP CUBIC congestion control algorithm
* move tcp_sack_newack bits inside reno and newreno_fast_retransmit_newack
* notify ECN peer about cwnd shrink in [new]reno_slow_retransmit

Based on the patch proposed on tech-net@ on Nov 7 with minor improvments:
* adapt wmax for no-fast convergence case
* correct cbrt calculation for big window sizes (>750KB)
 1.17 25-Oct-2013  martin Mark a diagnostic-only variable
 1.16 08-Apr-2011  yamt branches: 1.16.4; 1.16.14; 1.16.18;
simplify code a little. no functional changes.
 1.15 28-Apr-2008  martin branches: 1.15.22; 1.15.28;
Remove clause 3 and 4 from TNF licenses
 1.14 29-Feb-2008  matt branches: 1.14.2; 1.14.4;
Rework tcp congctl selection code so that the congctl entries can be const.
Don't access tcp_congctl stuff outside of tcp_congctl.c, use routines to
update t_congctl. This code is slightly now more complicated.
 1.13 11-Jul-2007  xtraeme branches: 1.13.8; 1.13.24; 1.13.28;
Replace a simple lock with a mutex and make it static.
 1.12 16-Nov-2006  christos branches: 1.12.2; 1.12.6; 1.12.12;
__unused removal on arguments; approved by core.
 1.11 21-Oct-2006  yamt branches: 1.11.2;
constify.
 1.10 19-Oct-2006  yamt tcp_reno_newack: remove an __unused because it's now used.
 1.9 19-Oct-2006  yamt tcp_reno_newack: regardless of sysctl setting, use L=1*SMSS when
we are doing retransmission.
 1.8 19-Oct-2006  yamt implement RFC3465 appropriate byte counting.
from Kentaro A. Kurahone, with minor adjustments by me.
the ack prediction part of the original patch was omitted because
it's a separate change. reviewed by Rui Paulo.
 1.7 15-Oct-2006  rpaulo Move comments to proper places.
 1.6 15-Oct-2006  rpaulo Add a new tcp_congctl(9) structure member for congestion experienced callback.
Needed by HSTCP.
 1.5 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.4 10-Oct-2006  rpaulo tcp_reno_newack(): bring the exact original code.
tcp_newreno_newack(): call tcp_reno_newack() if partialacks < 0.
 1.3 10-Oct-2006  yamt tcp_reno_newack/tcp_newreno_newack: remove stale comments.
 1.2 10-Oct-2006  yamt tcp_newreno_newack: actually inflate cwnd as it used to do.
 1.1 09-Oct-2006  rpaulo Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
 1.11.2.3 10-Dec-2006  yamt sync with head.
 1.11.2.2 22-Oct-2006  yamt sync with head
 1.11.2.1 21-Oct-2006  yamt file tcp_congctl.c was added on branch yamt-splraiseipl on 2006-10-22 06:07:28 +0000
 1.12.12.1 15-Jul-2007  ad Sync with head.
 1.12.6.4 17-Mar-2008  yamt sync with head.
 1.12.6.3 03-Sep-2007  yamt sync with head.
 1.12.6.2 30-Dec-2006  yamt sync with head.
 1.12.6.1 16-Nov-2006  yamt file tcp_congctl.c was added on branch yamt-lazymbuf on 2006-12-30 20:50:33 +0000
 1.12.2.2 18-Nov-2006  ad Sync with head.
 1.12.2.1 16-Nov-2006  ad file tcp_congctl.c was added on branch newlock2 on 2006-11-18 21:39:36 +0000
 1.13.28.2 02-Jun-2008  mjf Sync with HEAD.
 1.13.28.1 03-Apr-2008  mjf Sync with HEAD.
 1.13.24.1 24-Mar-2008  keiichi sync with head.
 1.13.8.1 23-Mar-2008  matt sync with HEAD
 1.14.4.1 16-May-2008  yamt sync with head.
 1.14.2.1 18-May-2008  yamt sync with head.
 1.15.28.1 06-Jun-2011  jruoho Sync with HEAD.
 1.15.22.1 21-Apr-2011  rmind sync with head
 1.16.18.1 18-May-2014  rmind sync with head
 1.16.14.2 03-Dec-2017  jdolecek update from HEAD
 1.16.14.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.16.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.19.10.1 18-Jan-2017  skrll Sync with netbsd-5
 1.19.8.1 05-Jan-2017  martin Pull up following revision(s) (requested by skrll in ticket #1347):
sys/netinet/tcp_congctl.c: revision 1.23
Restore behaviour to pre- tcp_congctl.c:1.18 for SACK. Further analysis
of the change is required.
OK kefren@
PR/51753 tcp SACK causes SSH disconnect
 1.19.6.3 05-Feb-2017  skrll Sync with HEAD
 1.19.6.2 29-May-2016  skrll Sync with HEAD
 1.19.6.1 22-Sep-2015  skrll Sync with HEAD
 1.19.4.1 05-Jan-2017  martin Pull up following revision(s) (requested by skrll in ticket #1347):
sys/netinet/tcp_congctl.c: revision 1.23
Restore behaviour to pre- tcp_congctl.c:1.18 for SACK. Further analysis
of the change is required.
OK kefren@
PR/51753 tcp SACK causes SSH disconnect
 1.21.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.23.14.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.23.14.2 21-May-2018  pgoyette Sync with HEAD
 1.23.14.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.25.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.25.2.1 10-Jun-2019  christos Sync with HEAD
 1.27.12.1 01-Aug-2021  thorpej Sync with HEAD.
 1.7 12-Nov-2013  kefren * implement TCP CUBIC congestion control algorithm
* move tcp_sack_newack bits inside reno and newreno_fast_retransmit_newack
* notify ECN peer about cwnd shrink in [new]reno_slow_retransmit

Based on the patch proposed on tech-net@ on Nov 7 with minor improvments:
* adapt wmax for no-fast convergence case
* correct cbrt calculation for big window sizes (>750KB)
 1.6 14-Apr-2011  yamt branches: 1.6.4; 1.6.14; 1.6.18;
- comments
- g/c stale extern
 1.5 28-Apr-2008  martin branches: 1.5.22; 1.5.28;
Remove clause 3 and 4 from TNF licenses
 1.4 29-Feb-2008  matt branches: 1.4.2; 1.4.4;
Rework tcp congctl selection code so that the congctl entries can be const.
Don't access tcp_congctl stuff outside of tcp_congctl.c, use routines to
update t_congctl. This code is slightly now more complicated.
 1.3 21-Oct-2006  yamt branches: 1.3.2; 1.3.4; 1.3.8; 1.3.30; 1.3.50; 1.3.54;
constify.
 1.2 15-Oct-2006  rpaulo Add a new tcp_congctl(9) structure member for congestion experienced callback.
Needed by HSTCP.
 1.1 09-Oct-2006  rpaulo Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
 1.3.54.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.54.1 03-Apr-2008  mjf Sync with HEAD.
 1.3.50.1 24-Mar-2008  keiichi sync with head.
 1.3.30.1 23-Mar-2008  matt sync with HEAD
 1.3.8.3 17-Mar-2008  yamt sync with head.
 1.3.8.2 30-Dec-2006  yamt sync with head.
 1.3.8.1 21-Oct-2006  yamt file tcp_congctl.h was added on branch yamt-lazymbuf on 2006-12-30 20:50:33 +0000
 1.3.4.2 18-Nov-2006  ad Sync with head.
 1.3.4.1 21-Oct-2006  ad file tcp_congctl.h was added on branch newlock2 on 2006-11-18 21:39:36 +0000
 1.3.2.2 22-Oct-2006  yamt sync with head
 1.3.2.1 21-Oct-2006  yamt file tcp_congctl.h was added on branch yamt-splraiseipl on 2006-10-22 06:07:28 +0000
 1.4.4.1 16-May-2008  yamt sync with head.
 1.4.2.1 18-May-2008  yamt sync with head.
 1.5.28.1 06-Jun-2011  jruoho Sync with HEAD.
 1.5.22.1 21-Apr-2011  rmind sync with head
 1.6.18.1 18-May-2014  rmind sync with head
 1.6.14.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.6.4.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.32 03-May-2018  maxv Remove now unused tcpip.h includes. Some were already unused before.
 1.31 29-Mar-2018  maxv Remove #ifdef INET. Same as tcp_input.c. Makes the code easier to
understand.

Also make tcp6_mtudisc() static in tcp_subr.c.
 1.30 26-Apr-2016  ozaki-r branches: 1.30.16;
Sweep unnecessary route.h inclusions
 1.29 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.28 18-Apr-2009  tsutsui branches: 1.28.22; 1.28.40;
Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.27 18-Mar-2009  cegger bcopy -> memcpy
 1.26 18-Mar-2009  cegger bzero -> memset
 1.25 04-Mar-2007  christos branches: 1.25.40; 1.25.50; 1.25.56;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.24 11-Dec-2005  christos branches: 1.24.26;
merge ktrace-lwp.
 1.23 06-Sep-2005  rpaulo Wrap two big lines.
 1.22 02-Jun-2005  riz branches: 1.22.2;
Fix some const fallout.
 1.21 03-Feb-2005  perry ANSIfy function declarations
 1.20 13-Jan-2005  drochner branches: 1.20.2; 1.20.4;
compile tcp_debug.c only if the TCP_DEBUG option is set,
and remove the "#ifdef TCP_DEBUG" around everything
 1.19 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.18 09-Jun-2002  itojun branches: 1.18.6;
whitespace
 1.17 13-Nov-2001  lukem branches: 1.17.8;
add RCSIDs
 1.16 08-Jul-2001  abs branches: 1.16.2;
Rename TCPDEBUG to TCP_DEBUG, defopt TCP_DEBUG and TCP_NDEBUG, and
make all usage of tcp_trace dependent on TCP_DEBUG - resulting in
a 31K saving on an INET enabled i386 kernel.
 1.15 08-Jul-2001  abs Give TCPDEBUG a chance of working - fix printf() types, add missing &s,
and remove attempt to use a non existant tcphdr field.
 1.14 01-Jul-1999  itojun branches: 1.14.14;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.13 13-Oct-1996  christos branches: 1.13.24; 1.13.26;
backout previous kprintf change
 1.12 13-Oct-1996  christos backout previous kprintf changes
 1.11 10-Oct-1996  christos printf -> kprintf, sprintf -> ksprintf
 1.10 13-Feb-1996  christos netinet prototypes
 1.9 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.8 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.6 08-Jan-1994  mycroft Prototypes.
 1.5 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.4 18-Dec-1993  mycroft Canonicalize all #includes.
 1.3 22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.13.26.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.13.26.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.13.26.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.13.24.1 01-Jul-1999  thorpej Sync w/ -current.
 1.14.14.3 20-Jun-2002  nathanw Catch up to -current.
 1.14.14.2 14-Nov-2001  nathanw Catch up to -current.
 1.14.14.1 24-Aug-2001  nathanw Catch up with -current.
 1.16.2.2 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.16.2.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.17.8.1 20-Jun-2002  gehenna catch up with -current.
 1.18.6.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.18.6.5 04-Feb-2005  skrll Sync with HEAD.
 1.18.6.4 17-Jan-2005  skrll Sync with HEAD.
 1.18.6.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.18.6.2 18-Sep-2004  skrll Sync with HEAD.
 1.18.6.1 03-Aug-2004  skrll Sync with HEAD
 1.20.4.1 12-Feb-2005  yamt sync with head.
 1.20.2.1 29-Apr-2005  kent sync with -current
 1.22.2.2 03-Sep-2007  yamt sync with head.
 1.22.2.1 21-Jun-2006  yamt sync with head.
 1.24.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.25.56.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.25.50.1 28-Apr-2009  skrll Sync with HEAD.
 1.25.40.1 04-May-2009  yamt sync with head.
 1.28.40.2 29-May-2016  skrll Sync with HEAD
 1.28.40.1 22-Sep-2015  skrll Sync with HEAD
 1.28.22.1 03-Dec-2017  jdolecek update from HEAD
 1.30.16.2 21-May-2018  pgoyette Sync with HEAD
 1.30.16.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.21 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.20 03-Feb-2021  roy tcp_debug: restore __packed
 1.19 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.18 01-May-2018  maxv branches: 1.18.14;
Redefine the structure, not to rely on tcpiphdr.
 1.17 04-Mar-2007  christos branches: 1.17.128;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.16 10-Dec-2005  elad branches: 1.16.26;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.15 02-Jun-2005  riz branches: 1.15.2;
Fix some const fallout.
 1.14 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.13 09-Jul-2001  itojun branches: 1.13.22;
do not #ifdef in headers. usr.sbin/trpt needs it.
 1.12 08-Jul-2001  abs Rename TCPDEBUG to TCP_DEBUG, defopt TCP_DEBUG and TCP_NDEBUG, and
make all usage of tcp_trace dependent on TCP_DEBUG - resulting in
a 31K saving on an INET enabled i386 kernel.
 1.11 30-May-2001  mrg use _KERNEL_OPT
 1.10 29-Apr-2001  fvdl Make it possible to override TCP_NDEBUG. The default value of 100
wastes quite a bit of space (0xfa00).
 1.9 31-Jul-1999  itojun branches: 1.9.14;
sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.8 09-Jul-1999  thorpej defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.7 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.6 10-Feb-1998  perry branches: 1.6.10; 1.6.12;
add/cleanup multiple inclusion protection.
 1.5 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.6.12.2 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.6.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.6.10.2 02-Aug-1999  thorpej Update from trunk.
 1.6.10.1 01-Jul-1999  thorpej Sync w/ -current.
 1.9.14.2 24-Aug-2001  nathanw Catch up with -current.
 1.9.14.1 21-Jun-2001  nathanw Catch up to -current.
 1.13.22.5 11-Dec-2005  christos Sync with head.
 1.13.22.4 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.13.22.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.13.22.2 18-Sep-2004  skrll Sync with HEAD.
 1.13.22.1 03-Aug-2004  skrll Sync with HEAD
 1.15.2.2 03-Sep-2007  yamt sync with head.
 1.15.2.1 21-Jun-2006  yamt sync with head.
 1.16.26.1 12-Mar-2007  rmind Sync with HEAD.
 1.17.128.1 02-May-2018  pgoyette Synch with HEAD
 1.18.14.1 03-Apr-2021  thorpej Sync with HEAD.
 1.16 07-Apr-2018  maxv Remove dead code.
 1.15 10-Dec-2005  elad branches: 1.15.162;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.14 20-Apr-2004  matt branches: 1.14.12;
export tcpstates for _KERNEL and remove tcp_usrreq.c's incorrect
declartion.
 1.13 20-Nov-2003  yamt comments on tcp_outflags.
 1.12 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.11 20-Oct-2001  matt branches: 1.11.18;
Make tcp_outflags & tcpstates const.
 1.10 09-Jul-1998  mycroft branches: 1.10.26; 1.10.28;
Back out the change from TCP/IP vol 2, in revision 1.7, which removed TH_FIN
from the output flags for CLOSING state. There is no harm in retransmitting
the FIN, and this change has unexpected side effects that break simultaneous
close behaviour.
 1.9 03-Jul-1998  thorpej Fix TCPS_HAVERCVDFIN() to actually catch all TCP states in which a FIN
has been received (CLOSE_WAIT, CLOSING, LAST_ACK, and TIME_WAIT).

From David Borman <dab@bsdi.com>.
 1.8 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.7 18-Jun-1997  kleink branches: 1.7.8;
As per RFC 793, don't retransmit the FIN during a simultaneous close.
From Thorsten Frueauf <frueauf@ira.uka.de> and W. Richard Stevens in PR/3737
and TCP/IP Illustrated, Vol. 2, respectively.
 1.6 14-Oct-1994  mycroft Don't return received data to the user until the initial handshake is complete.
Also use TCPS_HAVEESTABLISHED() in a few other places.
 1.5 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.7.8.1 01-Oct-1998  cgd pull up revs 1.9-1.10 from trunk. (mycroft)
 1.10.28.1 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.26.1 22-Oct-2001  nathanw Catch up to -current.
 1.11.18.4 11-Dec-2005  christos Sync with head.
 1.11.18.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.11.18.2 18-Sep-2004  skrll Sync with HEAD.
 1.11.18.1 03-Aug-2004  skrll Sync with HEAD
 1.14.12.1 21-Jun-2006  yamt sync with head.
 1.15.162.1 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.441 08-Oct-2024  rin tcp_reass: Mitigate CVE-2018-6922 (SegmentSmack)

at a level of FreeBSD, by introducing an arbitrary (100) limit to
the length of TCP reassembly queues:

https://github.com/freebsd/freebsd-src/commit/95a914f6316874f5b0c45d491f2843dc810071ef

Originally authored by ryo@.

We thank Tomoyuki Sahara <tsahara at iij>, who has analyzed the
problem again, updated the patch, and carried out experiments for
vulnerability scenarios. The confidential PR below is based on
his work.

PR security/58708
 1.440 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.439 29-Jun-2024  riastradh branches: 1.439.2;
netinet: Use _NET_STAT* API instead of direct array access.

PR kern/58380
 1.438 04-Nov-2022  ozaki-r branches: 1.438.2;
inpcb: rename functions to in6pcb_*
 1.437 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.436 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.435 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.434 20-Sep-2022  ozaki-r tcp: separate syn cache stuffs into tcp_syncache.[ch] files

No functional change.
 1.433 24-May-2022  andvar fix various typos in comment, documentation and log messages.
 1.432 23-Mar-2022  andvar fix few typos in comments, mainly s/paramenters/parameters/.
 1.431 09-Aug-2021  andvar fix typos in asymmetry, asymmetric(al), symmetrical.
 1.430 06-Aug-2021  andvar fix various typos in comments.
 1.429 31-Jul-2021  andvar s/threshhold/threshold
 1.428 08-Mar-2021  christos branches: 1.428.4;
Remove the unused "addin" argument (it was always 0) and go back using
a random iss by default (instead of rfc1948)
 1.427 19-Feb-2021  jakllsch it's spelled struct tcphdr, not struct tcp_hdr
 1.426 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.425 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.424 29-Sep-2020  msaitoh branches: 1.424.2;
s/occurence/occurrence/
 1.423 13-Sep-2020  roy inet: Fix build without ARP
 1.422 11-Sep-2020  roy ARP: Use ND rather than our own.

This brings the benefit of Neighbour Unreachability Detection which is
something ARP sorely lacks.

The new timings mirror those of IPv6 and are adjustable via sysctl(8).
Unlike IPv6 ND, these are global and not per interface.
 1.421 11-Sep-2020  roy tcp_input: Adjust for ND changes
 1.420 11-Sep-2020  kardel PR/kern 55567

fix the data-only fast path. RCV.UP and SND.WL1 could be left behind
on long sequences of data only packets. pull them along to avoid relative
sequence wraps.

consistent with FreeBSD

addresses second failure mode of PR/kern 55567.

pullup to netbsd-8
pullup to netbsd-9
 1.419 02-Sep-2020  kardel Fix fast path for uni directional transfers
pure ACK case:

drag snd_wl2 along so only newer
ACKs can update the window size.
also avoids the state where snd_wl2
is eventually larger than th_ack and thus
blocking the window update mechanism and
the connection gets stuck for a loooong
time in the zero sized send window state.

see PR/kern 55567

ok thorpej@, also found in FreeBSD
 1.418 06-Jul-2020  christos - always set both ip and ip6, otherwise a kernel assertion can be triggered
- move alignment early so that we do less work
 1.417 16-Nov-2019  maxv Call rtcache_unref() only when the checks succeed, instead of relying on
another NULL check in rtcache_unref().

Because, in order to resolve the address of the second argument, we do a
dereference on 'tp', which is theoretically allowed to be NULL. The five
callers of nd6_hint() never pass a NULL argument however, so by luck the
actual NULL deref never happens.

Maybe the NULL check on 'tp' in should be replaced to a KASSERT ensuring
it isn't NULL, for clarity.

Reported by kUBSan.
 1.416 25-Sep-2019  jnemeth PR/54572 - Edgar Fu� -- error in comment
 1.415 06-Aug-2019  riastradh Clamp tcp timer quantities to reasonable ranges.

Reported-by: syzbot+259675123340bf46a6de@syzkaller.appspotmail.com
 1.414 01-Jun-2019  kamil branches: 1.414.2;
Replace potentially misaligned pointer dereference + htonl() with be32dec()

Reported by kUBSan.
 1.413 08-Nov-2018  msaitoh "s/ an an / an /" in comment. No functional change.
 1.412 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.411 14-Sep-2018  maxv rename toff -> off
 1.410 14-Sep-2018  maxv rename off -> thlen
 1.409 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.408 18-May-2018  maxv branches: 1.408.2;
IP6_EXTHDR_GET -> M_REGION_GET, no functional change.
 1.407 03-May-2018  maxv Remove now unused tcpip.h includes. Some were already unused before.
 1.406 28-Apr-2018  maxv Remove unused ipsec_var.h includes.
 1.405 08-Apr-2018  maxv Remove the ipre_mlast field and the TRAVERSE macro.

The goal was to store in ipre_mlast the last mbuf of the chain, so that
m_cat could be called on it. But it's not needed, since m_cat already
does the equivalent of TRAVERSE itself.

If it were needed, there would be a bug, since we don't call TRAVERSE on
ipre_mlast when creating a new reassembly entry.
 1.404 03-Apr-2018  maxv Remove ipsec_copy_policy and ipsec_copy_pcbpolicy. No functional change,
since we used only ipsec_copy_pcbpolicy, and it was a no-op.

Originally we were using ipsec_copy_policy to optimize the IPsec-PCB
cache: when an ACK was received in response to a SYN, we used to copy the
SP cached in the SYN's PCB into the ACK's PCB, so that
ipsec_getpolicybysock could use the cached SP instead of requerying it.

Then we switched to ipsec_copy_pcbpolicy which has always been a no-op. As
a result the SP cached in the SYN was/is not copied in the ACK, and the
first call to ipsec_getpolicybysock had to query the SP and cache it
itself. It's not totally clear to me why this change was made.

But it has been this way for years, and after a conversation with Ryota
Ozaki it turns out the optimization is not valid anymore due to
MP-ification, so it won't be re-enabled.

ok ozaki-r@
 1.403 30-Mar-2018  maxv Fix the log. mtod never returns NULL, so 'ip' is always non-NULL, and the
'ip6' branch is never taken. As a result we log garbage on IPv6 packets.

Use ip_v instead.
 1.402 30-Mar-2018  maxv Use consttime_memequal instead of memcmp, to prevent side channels. This
functions returns 1 when the buffers are equal, contrary to memcmp, hence
the !.
 1.401 29-Mar-2018  rmind tcp_urp_drop: fix a bug introduced in 1.390 rev (hi maxv@).
 1.400 29-Mar-2018  maxv Remove TCPREASS_DEBUG. It was introduced 20 years ago when the reassembler
was being developed, but it's irrelevant today. Makes the code clearer.
 1.399 29-Mar-2018  maxv Reorder/Fix comments to clarify.
 1.398 29-Mar-2018  maxv Remove two more 'else' branches.
 1.397 29-Mar-2018  maxv Fix memory leak, we may reallocate 'tcp_saveti' after 'findpcb'. It's not
a tragic bug, because it happens only on sockets with debug enabled.
 1.396 29-Mar-2018  maxv Remove 'else', makes it clearer that we leave.
 1.395 29-Mar-2018  maxv Clarify with KASSERT.
 1.394 29-Mar-2018  maxv Simplify the computation:

m->m_pkthdr.len - sizeof(struct tcphdr) - optlen - hlen
= m->m_pkthdr.len - (sizeof(struct tcphdr) + optlen + hlen)
= m->m_pkthdr.len - [tcp_len]
= toff
 1.393 28-Mar-2018  maxv Several changes in syn_cache_respond:

* Replace idiotic diagnostic check by KASSERT. max_linkhdr+tlen<=MCLBYTES
is a widespread assumption.

* Improve initialization of 'tp'.

* Put panics in dead branches.

* Merge two switches.
 1.392 28-Mar-2018  maxv Remove unused variable.
 1.391 28-Mar-2018  maxv Remove two unused args from syn_cache_get().
 1.390 28-Mar-2018  maxv Dedup: introduce tcp_urp_drop() and use it.
 1.389 28-Mar-2018  maxv Minor changes: style, improve comments (and put them at the correct place),
use NULL for pointers, and add {}s to prevent confusion.
 1.388 23-Mar-2018  maxv Remove #ifdef INET. Nobody is doing that in the kernel, and there are
even IPv4 places that are not covered here.
 1.387 23-Mar-2018  maxv Improve a bit here and there. Replace bcopy by memcpy/memmove.
 1.386 22-Mar-2018  maxv Don't pass a pointer to tcp_reass, otherwise it looks like it can modify
tlen while it doesn't.
 1.385 22-Mar-2018  maxv Rearrange a bit. No real functional change.
 1.384 22-Mar-2018  maxv Don't call tcp_input_checksum again, it was already called earlier, no
need to checksum twice.

Then call tcp_fields_to_host a bit earlier, so that we don't need to call
it in each branch.
 1.383 01-Mar-2018  maxv branches: 1.383.2;
Revert rev1.183 (2003).

It was intended as an optimization, but it increases the attack surface:
the IPsec policy is not enforced on RST packets when the socket is in the
LISTEN state, and an (unauthenticated) attacker could jam the connection
between two IPsec hosts by sending RST packets between the client's SYN
and ACK packets.

Discussed with ozaki-r@.
 1.382 28-Feb-2018  maxv Remove unused ipsec_private.h includes.
 1.381 28-Feb-2018  maxv Remove duplicate IPSEC_STATINC(IPSEC_STAT_IN_POLVIO), ipsec_in_reject
already increases it. IPSEC6_STATINC is now unused, so remove it too.
 1.380 26-Feb-2018  maxv Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.379 12-Feb-2018  maxv Remove unused argument from tcp_signature_getsav.
 1.378 12-Feb-2018  maxv Add a KASSERT.
 1.377 12-Feb-2018  maxv Remove the 'm' argument from syn_cache_respond(); all it does with it is
freeing it, so free in the caller instead.
 1.376 12-Feb-2018  maxv Remove this multicast check. Multicast packets are already dropped at
the beginning of the function.
 1.375 09-Feb-2018  maxv Style, and move the 'ip_srcroute' call after 'tcp_dooptions', otherwise
we're leaking 'ipopts'. (Harmless, since TCP_SIGNATURE is disabled.)
 1.374 08-Feb-2018  maxv Style, rename a variable, and remove an unreachable case.
 1.373 08-Feb-2018  maxv Move the IPv4 multicast check earlier; we want to kick multicast packets
all the time, and not just when they are SYNs.

The IPv6 multicast check is already done earlier, so this block of code
can be removed.
 1.372 08-Feb-2018  maxv Remove the unused 'multicast' argument from tcp_vtw_input, and remove
the now-unused multicast detection code. It couldn't have been correct on
IPv6, since multicast packets are kicked at the beginning of the function.
 1.371 08-Feb-2018  maxv Remove the default case, the beginning of the function already ensures
af == AF_INET || af == AF_INET6.
 1.370 08-Feb-2018  maxv Dedup code.
 1.369 08-Feb-2018  maxv Remove the IN6_IS_ADDR_V4MAPPED checks in the protocol functions. They
are useless, because the IPv6 entry point (ip6_input) already performs
them.

The checks were first added in the protocol functions:

Wed Dec 22 04:03:02 1999 UTC (18 years, 1 month ago) by itojun

"drop IPv6 packets with v4 mapped address on src/dst. they are illegal
and may be used to fool IPv6 implementations (by using ::ffff:127.0.0.1 as
source you may be able to pretend the packet is from local node)"

Shortly afterwards they were also added in the IPv6 entry point, but
where not removed from the protocol functions:

Mon Jan 31 10:33:22 2000 UTC (18 years ago) by itojun

"be proactive about malicious packet on the wire. we fear that v4 mapped
address to be used as a tool to hose security filters (like bypassing
"local host only" filter by using ::ffff:127.0.0.1)."

OpenBSD did the same a few months ago. FreeBSD has never had these checks.
 1.368 08-Feb-2018  maxv Style, and remove outdated comments.
 1.367 08-Feb-2018  maxv Remove this check, it is already done at the beginning of the function.
 1.366 08-Feb-2018  maxv Reduce the indentation level of this huge block (without realigning yet,
for proofreadability). No functional change.
 1.365 08-Feb-2018  maxv Move the SO_DEBUG block earlier, to reduce the indentation level.
 1.364 08-Feb-2018  dholland Typos.
 1.363 15-Nov-2017  ozaki-r Convert SYN_CACHE_TIMER_ARM macro to static inline function (NFC)
 1.362 15-Nov-2017  ozaki-r Make syn_cache_timer static
 1.361 15-Nov-2017  ozaki-r Reduce return points (NFC)
 1.360 03-Aug-2017  ozaki-r Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
 1.359 07-Jul-2017  ozaki-r Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
 1.358 19-Jun-2017  ozaki-r Fix KASSERT in tcp_input

inp can be NULL when receiving an IPv4 packet on an IPv4-mapped IPv6
address. In that case KASSERT(sotoinpcb(so) == inp) always fails.

Should fix PR kern/52304 (at least it fixes the same panic as the
report)
 1.357 20-Apr-2017  ozaki-r branches: 1.357.4;
Remove unnecessary NULL checks for inp_socket and in6p_socket

They cannot be NULL except for programming errors.
 1.356 31-Mar-2017  ozaki-r Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)
 1.355 03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.354 07-Feb-2017  ozaki-r Add missing NULL checks for m_get_rcvif
 1.353 04-Jan-2017  kre branches: 1.353.2;

Remove redundant tests: if optlen === 0, then optlen % 4 != 2 (it is 0)
so there is no need to test both.
 1.352 02-Jan-2017  christos Fix TCP signature code:
1. pack options more tightly instead of being generous with no/op
2. put TCP_SIGNATURE option before SACK
3. fix computation of options length, by deferring it
XXX: Really we should move the options setting code in one place instead
of having two copies one for input and one for output.
XXX: tcp_optlen/tcp_hdrsiz need to be fixed; they were wrong before too.
 1.351 31-Dec-2016  christos remove ancient ipsec code, and don't conditionalize tcp signatures on ipsec_used
 1.350 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.349 15-Nov-2016  mrg apply a #ifdef INET6 so the previous compiles without INET6.
 1.348 15-Nov-2016  mlelstv Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.
 1.347 10-Jun-2016  ozaki-r branches: 1.347.2;
Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.346 10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.345 15-Feb-2016  rtr Reduce code duplication.

Split creation of IPv4-Mapped IPv6 addresses into its own function
and use it.

No functional change intended. As posted to tech-net@
 1.344 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.343 24-Jul-2015  matt Make sure that snd_win doesn't go negative.
 1.342 15-Jul-2015  ozaki-r Remove unused arguments and the associated code from nd6_nud_hint()

from OpenBSD
 1.341 24-May-2015  rtr remove transitional functions in{,6}_pcbconnect_m() that were used in
converting protocol user requests to accept sockaddr instead of mbufs.

remove tcp_input copy in to mbuf from sockaddr and just copy to sockaddr
to make it possible for the transitional functions to go away.

no version bump since these functions only existed for a short time and
were commented as adapters (they appeared in 7.99.15).
 1.340 15-May-2015  kefren Don't try to do PCB lookup for bad checksummed segments
Fixes PR/43510 and PR/48452
 1.339 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.338 27-Apr-2015  ozaki-r Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp

It consolidates a scattered routine:
(rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
 1.337 14-Mar-2015  rtr Move code that is conditional on options INET6 into #ifdef INET6.

* Re-organize some variable declarations to limit #ifdef's.
* Move INET and INET6 code into respective switch cases to simplify
#ifdef INET6.

No intended functional change.
 1.336 14-Feb-2015  he Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).
 1.335 02-Dec-2014  christos use the new printing code.
 1.334 08-Aug-2014  rtr branches: 1.334.2; 1.334.4;
split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.333 30-May-2014  rmind tcp_signature_getsav: handle !ipsec_used case and fix the build (hi christos!).
 1.332 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.331 01-Mar-2014  maxv branches: 1.331.2;
';;' -> ';'

no functional change

spotted by my code scanner

ok christos@
 1.330 12-Nov-2013  kefren * implement TCP CUBIC congestion control algorithm
* move tcp_sack_newack bits inside reno and newreno_fast_retransmit_newack
* notify ECN peer about cwnd shrink in [new]reno_slow_retransmit

Based on the patch proposed on tech-net@ on Nov 7 with minor improvments:
* adapt wmax for no-fast convergence case
* correct cbrt calculation for big window sizes (>750KB)
 1.329 15-Sep-2013  martin Remove unused variable
 1.328 29-Aug-2013  rmind Remove SS_ISCONFIRMING, it is unused and TP4 will not come back.
 1.327 06-Jun-2013  christos branches: 1.327.2;
merge error paths, pass the address of sav; pointed out by Greg Troxel
 1.326 05-Jun-2013  christos IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.325 22-Jun-2012  christos branches: 1.325.2;
remove unintended commit (this was to avoid a bug in the hme driver which
I have not been able to reproduce)
 1.324 22-Jun-2012  christos PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.323 13-Apr-2012  yamt comment
 1.322 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.321 11-Jan-2012  drochner branches: 1.321.2; 1.321.6; 1.321.8;
fix build in the (FAST_)IPSEC & TCP_SIGNATURE case
 1.320 31-Dec-2011  christos - fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.319 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.318 19-Nov-2011  tls branches: 1.318.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.317 31-Oct-2011  yamt branches: 1.317.2;
fix a double unlock bug introduced by tcp_input.c rev.1.312.
 1.316 31-Aug-2011  plunky NULL does not need a cast
 1.315 17-Jul-2011  joerg Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.314 25-May-2011  gdt Remove erroneous additional tick in RTO estimation. The variable
ts_rtt is 1 plus the RTT, so that 0 can mean invalid measurement.
However, the code failed to subtract the 1 back out before use. With
this change, TCP from Massachusetts to France now typically has 1s RTO
values, rather than 1.5s.

This bug was found and fixed by Bev Schwartz of BBN. This material is
based upon work supported by the Defense Advanced Research Projects
Agency and Space and Naval Warfare Systems Center, Pacific, under
Contract No. N66001-09-C-2073. Approved for Public Release,
Distribution Unlimited
 1.313 17-May-2011  dholland typo in comment
 1.312 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.311 25-Apr-2011  yamt tcp_input: simplify redundant assignment. no functional changes.
 1.310 20-Apr-2011  wiz Fix typos.
 1.309 20-Apr-2011  gdt Rewrite comments about TCP RTO calculations.

Long ago, the storage representations of srtt and rttvar were changed
from the 4.4BSD scheme, and the comments are out of sync with the
code. This commit rewrites most of the comments that explain the RTO
calculations, and points out some issues in the code.

Joint work with Bev Schwartz of BBN (original analysis and comments),
but I have rewritten and extended them, so errors are mine.

This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073. Approved for Public
Release, Distribution Unlimited
 1.308 14-Apr-2011  yamt comments
 1.307 09-Mar-2011  yamt fix a typo in rev.1.283, which broke tcp dupack and duppack statistics.
 1.306 02-Dec-2010  plunky branches: 1.306.2;
fix potential mbuf overflow, from Alexander Danilov on tech-net
 1.305 26-May-2010  bouyer Make sure SYN_CACHE_TIMER_ARM() has been run before calling syn_cache_put()
as it will reschedule the timer. Fixes PR kern/43318.
 1.304 21-Apr-2010  bouyer syn_cache_put(): defer all pool_put() to the callout. Reschedule
the callout if needed so frees are not delayed too much.
syn_cache_timer(): we can't call syn_cache_put() here any more,
so move code deleted from syn_cache_put() here.

Avoid KASSERT() in kern_timeout.c because pool_put() is called from
ipintr context, as reported in
http://mail-index.netbsd.org/tech-kern/2010/03/19/msg007762.html
Thanks to Andrew Doran and Mindaugas Rasiukevicius for help and review.
 1.303 16-Apr-2010  rmind tcp_input: set ECE flag even if CWR flag is active.
Submitted by Richard Scheffenegger in PR/43150.
 1.302 01-Apr-2010  tls Oops. Fix LOCKDEBUG panic -- and spurious calls to tcp_output()! -- in
previous. Be careful with that {}, Eugene.
 1.301 01-Apr-2010  tls After discussion with ad@: it appears that KERNEL_LOCK also protects
the driver output path (that is, ifp->if_output()). In the case of
entry through the socket code, we are fine, because pru_usrreq takes
KERNEL_LOCK. However, there are a few other ways to cause output
which require protection:

1) direct calls to tcp_output() in tcp_input()
2) fast-forwarding code (ip_flow) -- protected elsewise
against itself by the softnet lock.
3) *Possibly* the ARP code. I have currently persuaded
myself that it is safe because of how it's called.
4) Possibly the ICMP code.

This change addresses #1 and #2.
 1.300 26-Jan-2010  pooka branches: 1.300.2; 1.300.4;
tcp sockbuf autoscaling was initially added turned off because it
was experimental. People (including myself) have been running with
it turned on for eons now, so flip the default to enabled.
 1.299 09-Sep-2009  darran Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl.
Okayed by tls@.
 1.298 18-Jul-2009  minskim Check the minimum ttl only when pcb is available.
 1.297 17-Jul-2009  minskim Add the IP_MINTTL socket option.

The IP_MINTTL option may be used on SOCK_STREAM sockets to discard
packets with a TTL lower than the option value. This can be used to
implement the Generalized TTL Security Mechanism (GTSM) according to
RFC 3682.

OK'ed by christos@.
 1.296 20-Jun-2009  christos Follow exactly the recommendation of draft-ietf-tcpm-tcpsecure-11.txt:
Don't check gainst the last ack received, but the expected sequence number.
This makes RST handling independent of delayed ACK. From Joanne M Mikkelson.
 1.295 18-Mar-2009  cegger bzero -> memset
 1.294 18-Mar-2009  cegger bcmp -> memcmp
 1.293 15-Mar-2009  cegger ansify function definitions
 1.292 29-Jan-2009  pooka branches: 1.292.2;
stinkset purge: POOL_INIT -> pool_init
also, make the syncache pool static in scope
 1.291 04-Aug-2008  tls branches: 1.291.2; 1.291.4; 1.291.8;
Unlock reassembly queue before calling sorwakeup(), not after. In unusual
cases with in-kernel consumers which might send data on the same socket,
we can deadlock on the reassembly queue otherwise (observed while testing
accept filters).
 1.290 28-Jul-2008  matt Reacquire softnet_lock after calling soabort which returns with the socket
unlocked.
 1.289 04-Jul-2008  ad branches: 1.289.2;
tcp_input: add a couple of assertions.
 1.288 03-Jul-2008  ad syn_cache_get: remove new endpoint's socket from head's queue if aborting
the connection. Should fix KASSERT(so->so_head == NULL).
 1.287 28-Apr-2008  martin branches: 1.287.2; 1.287.4;
Remove clause 3 and 4 from TNF licenses
 1.286 24-Apr-2008  ad branches: 1.286.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.285 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.284 12-Apr-2008  thorpej branches: 1.284.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.283 08-Apr-2008  thorpej Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.
 1.282 01-Mar-2008  rmind Welcome to 4.99.55:

- Add a lot of missing selinit() and seldestroy() calls.

- Merge selwakeup() and selnotify() calls into a single selnotify().

- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.

Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
 1.281 27-Feb-2008  matt Convert stragglers to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.280 20-Feb-2008  yamt branches: 1.280.2; 1.280.6;
make TCP_SETUP_ACK, ICMP_CHECK, TCP_FIELDS_TO_HOST, and TCP_FIELDS_TO_NET
static functions.
 1.279 05-Feb-2008  yamt - start tcp timestamp from 1 instead of 0.
- add a comment to explain why:
+ * We start with 1, because 0 doesn't work with linux, which
+ * considers timestamp 0 in a SYN packet as a bug and disables
+ * timestamps.
 1.278 04-Feb-2008  yamt redo tcp_input.c rev.1.230 correctly.

revision 1.230
date: 2005/06/30 02:58:28; author: christos; state: Exp; lines: +20 -4
Normalize our PAWS code with Free and Open, as mentioned in tech-security.

reviewed by christos@ and matt@.
 1.277 29-Jan-2008  yamt revert tcp_output.c 1.253 because it has an ill effect when sending
small (not full-sized) segments.
http://mail-index.NetBSD.org/tech-net/2008/01/27/0009.html
 1.276 14-Jan-2008  dyoung Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().
 1.275 20-Dec-2007  martin A few missing ifdefs to make non-INET6 kernels build again.
 1.274 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.273 16-Dec-2007  elad Really fix low port allocation, by always passing a valid lwp to
in_pcbbind().

Okay dyoung@.

Note that the network code is another candidate for major cleanup... also
note that this issue is likely to be present in netinet6 code, too.
 1.272 09-Nov-2007  dyoung branches: 1.272.2; 1.272.6;
Change macros SYN_CACHE_PUT() and SYN_CACHE_RM() into inline
subroutines syn_cache_put() and syn_cache_rm().
 1.271 04-Nov-2007  rmind branches: 1.271.2;
Pick the smallest possible TCP window scaling factor that will still allow
us to scale up to sb_max. This might fix the problems with some firewalls.

Taken from FreeBSD (silby).
OK by <dyoung>.
 1.270 02-Aug-2007  yamt branches: 1.270.2; 1.270.4; 1.270.8; 1.270.10;
our tcp timestamps are in PR_SLOWHZ, not HZ.
 1.269 02-Aug-2007  rmind TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.268 09-Jul-2007  ad branches: 1.268.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.267 20-Jun-2007  christos - per socket keepalive settings
- settable connection establishment timeout
 1.266 18-May-2007  riz Fix compilation in the TCP_SIGNATURE case:

- don't use void * for pointer arithmetic
- don't try to modify const parameters

A kernel with 'options TCP_SIGNATURE' works as well as it ever did, now.
(ie, clunky, but passable)
 1.265 18-May-2007  riz Revert a small part of revision 1.254 - remove const qualifier from
the struct tcphdr * argument of tcp_dooptions(). RFC2385 support
(options TCP_SIGNATURE) needs to modify the header during options
processing, and this revision broke it.

OK yamt@.
 1.264 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.263 12-Mar-2007  ad branches: 1.263.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.262 04-Mar-2007  christos branches: 1.262.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.261 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.260 10-Feb-2007  degroote branches: 1.260.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.259 15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.258 09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.257 06-Dec-2006  yamt add some more tcp mowners.
 1.256 06-Dec-2006  yamt - make tcp_reass static.
- constify.
 1.255 16-Nov-2006  christos branches: 1.255.2; 1.255.4;
__unused removal on arguments; approved by core.
 1.254 21-Oct-2006  yamt - constify.
- make tcp_dooptions and tcpipqent_pool static.
 1.253 17-Oct-2006  yamt tcp_input: if we have SACK, don't enter fastrecovery on three dupacks.
otherwise, we can enter fastrecovery due to DSACKs, which we treat
as dupacks here. PR/34748. reviewed by Rui Paulo.
 1.252 15-Oct-2006  rpaulo Move comments to proper places.
 1.251 15-Oct-2006  rpaulo Add a new tcp_congctl(9) structure member for congestion experienced callback.
Needed by HSTCP.
 1.250 12-Oct-2006  rpaulo PR 34776: don't accept TCP connections to broadcast addresses.
Move the multicast/broadcast check above (before creating a syn_cache entry)
By Yasuoka Yasuoka.
 1.249 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.248 10-Oct-2006  rpaulo Revert previous. The check is now done in tcp_congctl.
 1.247 10-Oct-2006  yamt tcp_input: don't call congctl->newack when doing fast retransmit.
 1.246 09-Oct-2006  rpaulo Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
 1.245 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.244 05-Sep-2006  rpaulo branches: 1.244.2; 1.244.4;
Import of TCP ECN algorithm for congestion control.
Both available for IPv4 and IPv6.
Basic implementation test results are available at
http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.

Work sponsored by the Google Summer of Code project 2006.
Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their
help, comments and support during the project.
 1.243 07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.242 27-May-2006  bouyer Revert rev 1.241: calling m_makewritable() in tcp_input causes problems when
it has to change the mbuf chain. I experience hard hang on a Xen2 domU after
TCP connections have been closed, and a crash has been reported which may be
caused by this too.
 1.241 25-May-2006  bouyer If we're going to byteswap fields in the TCP header, make sure the mbuf
area is writable first.
 1.240 15-Apr-2006  christos branches: 1.240.2;
Coverity CID 1152: Add KASSERT before deref.
 1.239 18-Feb-2006  rpaulo branches: 1.239.2; 1.239.4; 1.239.6;
PR 13952: Noritoshi Demizu: correct the TCP window information update check.
 1.238 02-Feb-2006  riz branches: 1.238.2;
If TCP_SIGNATURE is defined, include netinet6/scope6_var.h for the
prototype of in6_clearscope(). Kernels with options TCP_SIGNATURE now
compile again after the IPv6 scoped address changes.
 1.237 15-Nov-2005  dsl branches: 1.237.2; 1.237.4;
Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
 1.236 12-Aug-2005  christos branches: 1.236.6;
If called from syn_cache_add, we need to initialize t_state before calling
tcp_dooptions. Pointed out by yamt.
 1.235 12-Aug-2005  hubertf Clarify comment that "the protocol specification dated September, 1981"
is really RFC 793.
 1.234 11-Aug-2005  christos Don't process TCP options in SYN packets after the connection has
been established. (FreeBSD-SA-05:15.tcp)
 1.233 10-Aug-2005  yamt move {tcp,udp}_do_loopback_cksum back to tcp/udp
so that they can be referenced by ipv6.
 1.232 10-Aug-2005  yamt device independent part of ipv6 rx checksum offloading.
 1.231 19-Jul-2005  christos Implement PMTU checks from:

http://www.gont.com.ar/drafts/icmp-attacks-against-tcp.html

1. Don't act on ICMP-need-frag immediately if adhoc checks on the
advertised MTU fail. The MTU update is delayed until a TCP retransmit
happens.
2. Ignore ICMP Source Quench messages meant for TCP connections.

From OpenBSD.
 1.230 30-Jun-2005  christos branches: 1.230.2;
Normalize our PAWS code with Free and Open, as mentioned in tech-security.
 1.229 06-Jun-2005  yamt tcp_input: don't overload opti.ts_ecr.
 1.228 29-May-2005  christos - add const
- remove bogus casts
- avoid nested variables
 1.227 26-Apr-2005  manu Fix build problem after recent NAT-T changes
 1.226 03-Apr-2005  yamt tcp_input: update a comment to match with the code.
 1.225 29-Mar-2005  yamt protect tcpipqent with splvm.
 1.224 16-Mar-2005  yamt branches: 1.224.2;
simplify data receiver side sack processing.
- introduce t_segqlen, the number of segments in segq/timeq.
the name is from freebsd.
- rather than maintaining a copy of sack blocks (rcv_sack_block[]),
build it directly from the segment list when needed.
 1.223 02-Mar-2005  mycroft Copyright maintenance.
 1.222 28-Feb-2005  jonathan Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over. Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting both
Kentaro's wired-Ethernet NIC and in my two (different) WiFi NICs.
 1.221 26-Feb-2005  perry nuke trailing whitespace
 1.220 03-Feb-2005  perry ANSIfy function declarations
 1.219 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.218 28-Jan-2005  mycroft Several changes based on comparison with NS:

1) dupseg_fix_=true from NS: do not count a segment with completely duplicate
data as a duplicate ack. This can occur due to duplicate packets in the
network, or due to fast retransmit from the other side.

2) dupack_reset_=false from NS: do not reset the duplicate ack counter or exit
fast recovery if we happen to get data or a window update along with a
duplicate ack.

3) In the "very old ack" case that itojun added, send an ACK before dropping
the segment, to try to update the other side's send sequence number.

4) Check the ssthresh crossover point with >= rather than >. Otherwise we
start to do "exponential" growth immediately following recovery, where we
should be doing "linear". This is what NS does.
 1.217 27-Jan-2005  mycroft There is no reason to adjust ts_recent_age for ts_timebase; it's strictly an
internal variable.
 1.216 27-Jan-2005  mycroft Do the other TCP_PAWS_IDLE check unsigned as well. It doesn't do us any harm,
and it could detect even older time stamps. (Really, to be 100% correct, there
should be a timer that clears these out -- but it probably doesn't matter in
the real world.)
 1.215 27-Jan-2005  mycroft Also check whether an echoed RTT is very large -- this *could* cause the
smoothing function to overflow. I use TCP_PAWS_IDLE (24 days) for this.
 1.214 27-Jan-2005  mycroft Introduce a new state variable, t_partialacks. It has 3 states:
* t_partialacks<0 means we are not in fast recovery.
* t_partialacks==0 means we are in fast recovery, but we have not received
any partial acks yet.
* t_partialacks>0 means we are in fast recovery, and we have received
partial acks.

This is used to implement 2 changes in RFC 3782:
* We keep the notion that we are in fast recovery separate from t_dupacks, so
it is not reset due to out-of-order acks. (This affects both the Reno and
NewReno cases.)
* We only reset the retransmit timer on the first partial ack -- preventing us
from possibly taking one RTO per segment once fast recovery is initiated.

As before, it is hard to measure any difference between Reno and NewReno in the
real-world cases that I've tested.
 1.213 26-Jan-2005  mycroft Fix two problems in our TCP stack:

1) If an echoed RFC 1323 time stamp appears to be later than the current time,
ignore it and fall back to old-style RTT calculation. This prevents ending
up with a negative RTT and panicking later.

2) Fix NewReno. This involves a few changes:

a) Implement the send_high variable in RFC 2582. Our implementation is
subtly different; it is one *past* the last sequence number transmitted
rather than being equal to it. This simplifies some logic and makes
the code smaller. Additional logic was required to prevent sequence
number wraparound problems; this is not mentioned in RFC 2582.

b) Make sure we reset t_dupacks on new acks, but *not* on a partial ack.
All of the new ack code is pushed out into tcp_newreno(). (Later this
will probably be a pluggable function.) Thus t_dupacks keeps track of
whether we're in fast recovery all the time, with Reno or NewReno, which
keeps some logic simpler.

c) We do not need to update snd_recover when we're not in fast recovery.
See tech-net for an explanation of this.

d) In the gratuitous fast retransmit prevention case, do not send a packet.
RFC 2582 specifically says that we should "do nothing".

e) Do not inflate the congestion window on a partial ack. (This is done by
testing t_dupacks to see whether we're still in fast recovery.)

This brings the performance of NewReno back up to the same as Reno in a few
random test cases (e.g. transferring peer-to-peer over my wireless network).
I have not concocted a good test case for the behavior specific to NewReno.
 1.212 21-Dec-2004  yamt branches: 1.212.2; 1.212.4;
factor out receive side tcp/udp checksum handling code so that they
can be used by eg. packet filters.

reviewed by Christos Zoulas on tech-net@.
(slightly tweaked since then to make tcp and udp similar.)
 1.211 18-Dec-2004  yamt tcp_input: add missing loopback checksum omission code for ipv6.
 1.210 15-Dec-2004  thorpej Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.209 15-Sep-2004  yamt fix ipqent pool corruption problems. make tcp reass code use
its own pool of ipqent rather than sharing it with ip reass code.
PR/24782.
 1.208 26-Jun-2004  itojun correct TCP-MD5 support. Jeff Rizzo
 1.207 23-May-2004  jonathan Remove now-unused variable.
 1.206 18-May-2004  itojun fix MD5 signature support to actually validate inbound signature, and
drop packet if fails.
 1.205 07-May-2004  jonathan Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.204 27-Apr-2004  matt When a packet is received that overlaps the left side of the window,
check for RST *before* trimming data and adjust its sequence number.
 1.203 26-Apr-2004  itojun make TCP MD5 signature work with KAME IPSEC (#define IPSEC).

support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream
for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the
right thing).

XXX
current TCP MD5 signature code has giant flaw:
it does not validate signature on input (can't believe it! what is the point?)
 1.202 26-Apr-2004  matt Remove #else clause of __STDC__
 1.201 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.200 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.199 25-Apr-2004  itojun fix how we send RST against ACK. markus@openbsd
 1.198 25-Apr-2004  itojun indent for little bit better readability
 1.197 24-Apr-2004  itojun fix comment; we no longer move ip+tcp into the same mbuf
 1.196 22-Apr-2004  ragge Avoid performance problem in tcp_reass() when appending mbufs to a chain
by keeping a pointer to the last mbuf in the chain.
 1.195 20-Apr-2004  itojun follow draft-ietf-tcpm-tcpsecure-00.txt 3.2 (B):
if SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
 1.194 20-Apr-2004  itojun - respond to RST by ACK, as suggested in NISCC recommendation
- rate-limit ACKs against RSTs and SYNs
 1.193 17-Apr-2004  matt If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it. Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window. If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
 1.192 14-Apr-2004  ragge Add back one line which was accidentially removed (by me) a while ago.
Spotted by Markus Friedl (markus at openbsd.org).
 1.191 29-Mar-2004  atatat Make these compile without INET. tcp_input probably needs a lot more
work...
 1.190 10-Mar-2004  drochner branches: 1.190.2;
fix tcp/udp checksum test in the M_CSUM_NO_PSEUDOHDR case
(this can never have worked)
now I can use a "bge" gigabit interface with hw checksumming
ttcp-t: 2147483648 bytes in 18.31 real seconds = 114527.11 KB/sec +++
woow!
 1.189 26-Feb-2004  itojun KNF
 1.188 02-Jan-2004  itojun some corrections from markus@openbsd;
- callout_ack() was called with wrong argument
- no need for xor with timestamp as we are using arc4random()
- minor typo/cleanup
 1.187 19-Nov-2003  jonathan Footwork for fast-ipsec and IPv6: when compiling sys/netinet/tcp_input.c
for both FAST_IPSEC and INET6, include <netipsec/ipsec6.h>.
 1.186 24-Oct-2003  ragge Fix the bug in the tcp transmit prediction code.
During testing the prediction counters show a hit-rate on about 85% for
packets sent on a local LAN, and better than 99% for intercontinental
high-speed bulk traffic (!).
 1.185 23-Oct-2003  mycroft Remove all the code to maintain ia_inpcbs. This information was only used to
close sockets on address changes, which was deemed to be a bad idea and was
summarily removed, so there is no point in wasting effort on maintaining it
any more.
 1.184 10-Sep-2003  itojun cut-and-paste error. Valeriy E. Ushakov
 1.183 10-Sep-2003  itojun if IPsec inbound policy mismatches, respond to SYN with RST (instead of
just dropping it), allow client to react quickly.
 1.182 06-Sep-2003  itojun clarify flowlabel handling
 1.181 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.180 22-Aug-2003  itojun make sure so is properly initialized
 1.179 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.178 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.177 22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.176 21-Aug-2003  jonathan Honour the M_CSUM_NO_PSEUDOHDR, if set on inbound TCP and UDP packets.
Tested against bcm5700 with patched if_bge.c.
 1.175 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.174 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.173 20-Jul-2003  he As a temporary workaround, apply the fix from PR#20390, thereby
cooperating with the callout code in working around the race
condition caused by the TCP code's use of the callout facility.

Instead of unconditionally releasing memory in tcp_close() and
SYN_CACHE_PUT(), check whether any of the related callout handlers
are about to be invoked (but have not yet done callout_ack()), and
if so, just mark the associated data structure (tcpcb or syn cache
entry) as "dead", and test for this (and release storage) in the
callout handler functions.
 1.172 02-Jul-2003  ragge Fix previous bug. Thanks to Enami for spotting the (obvious) error, and
to other people with much help with bug reports etc.
While fixing, change some of the code I added last time to make it
cleaner and simpler.
 1.171 29-Jun-2003  fvdl branches: 1.171.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.170 29-Jun-2003  ragge Add code to remember where in the send queue of mbufs the last packet was
sent from. This change avoid a linear search through all mbufs when using
large TCP windows, and therefore permit high-speed connections on long
distances.

Tested on a 1 Gigabit connection between Lule� and San Francisco, a distance
of about 15000km. With TCP windows of just over 20 Mbytes it could keep up
with 950Mbit/s.

After discussions with Matt Thomas and Jason Thorpe.
 1.169 15-Jun-2003  matt Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.168 30-May-2003  itojun inherit IPV6_V6ONLY from listening socket. PR 21713
 1.167 17-May-2003  itojun no need for ip_v recovery in output path too
(tcp_template includes ip_v setting)
 1.166 17-May-2003  itojun ip checksum logic no longer damage ip_v
 1.165 16-May-2003  itojun use strlcpy
 1.164 14-May-2003  itojun always use PULLDOWN_TEST codepath.
 1.163 01-Mar-2003  thorpej Allow TCP connections to hosts on a local network to use a larger
slow start initial window. Default this larger initial window to
4 packets, allowing it to be adjusted with net.inet.tcp.init_win_local.
 1.162 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.161 25-Feb-2003  he Swap neighboring lines of callout_init() and bzero() of container
struct in syn_cache_add(); the bzero() invalidates whatever
callout_init() has done (which might matter, but presently doesn't).
 1.160 04-Jan-2003  wiz Spell output with two ts.
 1.159 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.158 22-Oct-2002  thorpej Make sure TF_REQ_TSTMP and TF_REQ_SCALE get set correctly in the new
TCPCB in the passive-open case.

Fixes PR 18677.
 1.157 22-Oct-2002  simonb In tcp_input():
Remove the set-but-not-used "proto" variable.
Guard the "ostate" variable in #ifdef TCP_DEBUG.
Remove the set-but-not-used "parentinpcb" variable in syn_cache_get().
 1.156 16-Oct-2002  itojun correct log_refused check (TH_SYN, !TH_RST and !TH_ACK). PR 18669
 1.155 11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.154 05-Sep-2002  itojun always consult SS_CANTRCVMORE. PR 18185
 1.153 28-Aug-2002  thorpej Fix a problem introduced in rev 1.103, where we recycle a TIME_WAIT
TCPCB .. the fields need to be converted back to net-order, because
the packet is checksummed after the TCPCB lookup happens.

From YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>.
 1.152 19-Aug-2002  itojun better sync w/kame on deprecated address handling. check af == AF_INET6.
 1.151 19-Aug-2002  itojun pull in deprecated address handling from KAME sys/netinet6/tcp6_input.c.
 1.150 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.149 18-Jul-2002  wrstuden When a new connection arrives on a listening port, copy over the
value of the TCP_NODELAY socket option from the listener to the
newly connected connection. Agrees with how Linux & FreeBSD behave,
and goes more with the spirit of accept(2) creating a socket with
the same properties as the listener.

Analysis by Kevin Lahey. Closes PR 17616 by myself.
 1.148 03-Jul-2002  thorpej Rename sbappend_stream() to sbappendstream(), per suggestion from
Jonathan Stone.
 1.147 03-Jul-2002  thorpej Make insertion of data into socket buffers O(C):
* Keep pointers to the first and last mbufs of the last record in the
socket buffer.
* Use the sb_lastrecord pointer in the sbappend*() family of functions
to avoid traversing the packet chain to find the last record.
* Add a new sbappend_stream() function for stream protocols which
guarantee that there will never be more than one record in the
socket buffer. This function uses the sb_mbtail pointer to perform
the data insertion. Make TCP use sbappend_stream().

On a profiling run, this makes sbappend of a TCP transmission using
a 1M socket buffer go from 50% of the time to .02% of the time.

Thanks to Bill Sommerfeld and YAMAMOTO Takashi for their debugging
assistance!
 1.146 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.145 29-Jun-2002  yamt split logging code in order to reduce maximum stack usage.
 1.144 11-Jun-2002  itojun share policy-on-pcb for listening socket. sync w/kame
todo: share even more, avoid frequent updates of spidx
 1.143 09-Jun-2002  itojun whitespace
 1.142 28-May-2002  itojun use arc4random() where possible.
XXX is it necessary to do microtime() on tcp syn cache?
 1.141 07-May-2002  matt branches: 1.141.2; 1.141.4;
Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.
 1.140 24-Mar-2002  christos Change the multicast/broadcast test to happen later, and when we are
in listen mode. Fixes panic with telnet ::1 port, where the port is an
ipv4 open port.
 1.139 22-Mar-2002  itojun no need to check in_broadaddr/IN_MULTICAST in dropwithreset label.
suggested by enami
 1.138 22-Mar-2002  itojun make sure we don't touch "ip" in IPv6 path
 1.137 19-Mar-2002  christos Drop connections to the broadcast address. From BUGTRAQ. This is a security
issue because it can by-pass ipf rules unintentionally.
 1.136 12-Mar-2002  itojun support tcp_log_refused for IPv6. From: Andrew Brown <atatat@atatdot.net>
 1.135 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.134 24-Jan-2002  itojun place NRL copyright notice itself, not a reference to it.
 1.133 13-Nov-2001  lukem add RCSIDs
 1.132 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.131 17-Sep-2001  thorpej branches: 1.131.2;
Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.
 1.130 11-Sep-2001  thorpej Use callouts for SYN cache timers, rather than traversing time queues
in tcp_slowtimo().
 1.129 10-Sep-2001  thorpej Use callouts for TCP timers, rather than traversing the list of
all open TCP connections in tcp_slowtimo() (which is called 2x
per second). It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
 1.128 10-Sep-2001  thorpej Change the way receive idle time and round trip time are measured.
Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now
take a timstamp (via tcp_now) and use subtraction to compute the
delta when we actually need it (using unsigned arithmetic so that
tcp_now wrapping is handled correctly).

Based on similar changes in FreeBSD.
 1.127 08-Jul-2001  abs branches: 1.127.2; 1.127.4;
Rename TCPDEBUG to TCP_DEBUG, defopt TCP_DEBUG and TCP_NDEBUG, and
make all usage of tcp_trace dependent on TCP_DEBUG - resulting in
a 31K saving on an INET enabled i386 kernel.
 1.126 19-Jun-2001  wiz `existent', not `existant'
 1.125 02-Jun-2001  thorpej Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.124 08-May-2001  itojun correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)
 1.123 20-Mar-2001  thorpej Two changes, designed to make us even more resilient against TCP
ISS attacks (which we already fend off quite well).

1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic
hash method of generating TCP ISS values. Note, this code is experimental
and disabled by default (experimental enough that I don't export the
variable via sysctl yet, either). There are a couple of issues I'd
like to discuss with Steve, so this code should only be used by people
who really know what they're doing.

2. Per a recent thread on Bugtraq, it's possible to determine a system's
uptime by snooping the RFC1323 TCP timestamp options sent by a host; in
4.4BSD, timestamps are created by incrementing the tcp_now variable
at 2 Hz; there's even a company out there that uses this to determine
web server uptime. According to Newsham's paper "The Problem With
Random Increments", while NetBSD's TCP ISS generation method is much
better than the "random increment" method used by FreeBSD and OpenBSD,
it is still theoretically possible to mount an attack against NetBSD's
method if the attacker knows how many times the tcp_iss_seq variable
has been incremented. By not leaking uptime information, we can make
that much harder to determine. So, we avoid the leak by giving each
TCP connection a timebase of 0.
 1.122 24-Jan-2001  itojun branches: 1.122.2;
- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.121 10-Dec-2000  itojun remove NRL code leftover. sync with kame
 1.120 19-Oct-2000  itojun remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
 1.119 17-Oct-2000  itojun be more friendly with INET-less build.
XXX we need to do more to do a working INET-less build
 1.118 17-Oct-2000  thorpej Add an IP_MTUDISC flag to the flags that can be passed to
ip_output(). This flag, if set, causes ip_output() to set
DF in the IP header if the MTU in the route is not locked.

This allows a bunch of redundant code, which I was never
really all that happy about adding in the first place, to
be eliminated.

Inspired by a similar change made by provos@openbsd.org when
he integrated NetBSD's Path MTU Discovery code into OpenBSD.
 1.117 28-Jul-2000  itojun nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit
 1.116 27-Jul-2000  itojun implement net.inet.tcp.rstppslimit to limit TCP RSTs by packet-per-second
basis. default: 100pps

set default value for net.inet.tcp.rstratelimit to 0 (disabled),
NOTE: it does not work right for smaller-than-1/hz interval. maybe we should
nuke it, or make it impossible to set smaller-than-1/hz value.
 1.115 27-Jul-2000  itojun be proactive about unspecified IPv6 source address. pcb layer uses
unspecified address (::) to mean "unbounded" or "unconnected",
and can be confused by packets from outside.

use of :: as source is not documented well in IPv6 specification.

not sure if it presents a real threat. the worst case scenario is a DoS
against TCP listening socket:
- outsider transmit TCP SYN with :: as IPv6 source
- receiving side creates TCP control block with:
local address = my addres
remote address = :: (meaning "unconnected")
state = SYN_RCVD
note that SYN ACK will not be sent due to ip6_output() filter.
this stays until it timeouts.
- the TCP control block prevents listening TCP control block from
being contacted (DoS).

udp6/raw6 socket may have similar problem, but as they are connectionless,
it may too much to filter it out.
 1.114 23-Jul-2000  itojun add an DIAGNOSTIC case for MCLBYTES assumption
 1.113 09-Jul-2000  itojun be more cautious about tcp option length field. drop bogus ones earlier.
not sure if there is a real threat or not, but it seems that there's
possibility for overrun/underrun (like non-NOP option with optlen > cnt).
 1.112 06-Jul-2000  itojun - do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
 1.111 05-Jul-2000  thorpej Fix some zero-vs-NULL confusion.
 1.110 02-Jul-2000  itojun repair kernel faithd(8) support. there were two mistakes:
(1) tcp6_input dropped packets for translation
(2) in6_pcblookup_connect was too strict
 1.109 30-Jun-2000  itojun remove old mbuf assumption (ip header and tcp header are on the same mbuf).
this is for m_pulldown use. (sync with kame)
 1.108 05-May-2000  matt branches: 1.108.4;
remove superfluous test (snd_una is always > iss since th_ack must > iss
(first test at start of case) and th_ack is assigned to snd_una).
 1.107 05-May-2000  matt From PR #3733: Only disarm timer if SYN contained the ACK bit since if
it didn't it would be a crossing/simultaneous SYN and doesn't mean the
remote TCP received our SYN.
 1.106 30-Mar-2000  augustss Remove register declarations.
 1.105 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.104 15-Feb-2000  thorpej Add support for rate-limiting RSTs sent in response to no socket for
an incoming packet. Default minimum interval is 10ms. The interval
is changeable via the "net.inet.tcp.rstratelimit" sysctl variable.
 1.103 12-Feb-2000  thorpej In the tcp_input() path:
- Filter out multicast destinations explicitly for every incoming packet,
not just SYNs. Previously, non-SYN multicast destination would be
filtered out as a side effect of PCB lookup. Remove now redundant
similar checks in the dropwithreset case and in syn_cache_add().
- Defer the TCP checksum until we know that we want to process the
packet (i.e. have a non-CLOSED connection or a listen socket).
 1.102 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.101 22-Dec-1999  itojun drop IPv6 packets with v4 mapped address on src/dst. they are illegal
and may be used to fool IPv6 implementations (by using ::ffff:127.0.0.1 as
source you may be able to pretend the packet is from local node)
 1.100 15-Dec-1999  itojun do not overwrite traffic class field when we write IPv6 version field.
 1.99 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.98 11-Dec-1999  itojun implement upper-layer reachability confirmation for IPv6 ND (RFC2461 7.3.1).
fix code to reject "tcp to IPv6 anycast".

sync with recent KAME.
 1.97 08-Dec-1999  itojun do not drop from IP header to tcp option until sbappend(), to reduce
requirement to mbuf chain.
part of KAME sync, committed separately for its (possible) impact.
 1.96 23-Sep-1999  itojun branches: 1.96.2; 1.96.8;
cleanup and correct TCP MSS consideration with IPsec headers.

MSS advertisement must always be:
max(if mtu) - ip hdr siz - tcp hdr siz
We violated this in the previous code so it was fixed.

tcp_mss_to_advertise() now takes af (af on wire) as its argument,
to compute right ip hdr siz.

tcp_segsize() will take care of IPsec header size.
One thing I'm not really sure is how to handle IPsec header size in
*rxsegsizep (inbound segment size estimation).
The current code subtracts possible *outbound* IPsec size from *rxsegsizep,
hoping that the peer is using the same IPsec policy as me.
It may not be applicable, could TCP gulu please comment...
 1.95 10-Sep-1999  simonb s/acknowledgment/acknowledgement/
 1.94 26-Aug-1999  thorpej Fix a problem discovered by the snd_recover update fix. A bit of the
New Reno fast recovery code was being executed even when New Reno was
disabled, resulting in an unfortunate interaction with the traditional
fast recovery code, the end resulting being that the very condition
that would trigger the traditional fast recovery mechanism caused fast
recovery to be disabled!

Problem reported by Ted Lemon, and some analytical help from Charles Hannum.
 1.93 25-Aug-1999  itojun When listening socket goes away, remove assockated syn cache entires.
Stale syn cache entries are useless because none of them will be used
if there is no listening socket, as tcp_input looks up listening socket by
in_pcblookup*() before looking into syn cache.

This fixes race condition due to dangling socket pointer from syn cache
entries to listening socket (this was introduced when ipsec is merged in).

This should preserve currently implemented behavior (but not 4.4BSD
behavior prior to syn cache).

Tested in KAME repository before commit, but we'd better run some
regression tests.
 1.92 23-Aug-1999  christos PR/8254: Wolfgang Rupprecht: Incorrect logging of tcp connections; Fix src/dst
confusion.
 1.91 11-Aug-1999  thorpej Fix a few bugs in the TCP New Reno code:
- Make sure that snd_recover is always at least snd_una. If we don't do
this, there can be confusion when sequence numbers wrap around on a
large loss-free data transfer.
- When doing a New Reno retransmit, snd_una hasn't been updated yet,
and the socket's send buffer has not yet dropped off ACK'd data, so
don't muddle with snd_una, so that tcp_output() gets the correct data
offset.
- When doing a New Reno retransmit, make sure the congestion window is
open one segment beyond the ACK'd data, so that we can actually perform
the retransmit.

Partially derived from, although more complete than, similar changes in
OpenBSD, which in turn originated from Tom Henderson <tomh@cs.berkeley.edu>.
 1.90 11-Aug-1999  thorpej Make sure the echoed RFC 1323 timestamp is valid before using it to
compute the round trip time. From Mark Allman <mallman@lerc.nasa.gov>.
 1.89 22-Jul-1999  itojun - implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.
 1.88 17-Jul-1999  itojun no need to include faith.h on non-IPv6 build, so wrap by #ifdef.
(dunno if it's better to always include it or not)
 1.87 17-Jul-1999  itojun fix faith interface support. need testing.
(i understand this is a dirty hack, of course)
 1.86 14-Jul-1999  itojun Use proper ip protocol # field and tcp hdr on sending RST against SYN,
when ip header and tcp header are not adjacent to each other
(i.e. when ip6 options are attached).

To test this, try
telnet @::1@::1 port
toward a port without responding server. Prior to the fix, the kernel will
generate broken RST packet.
 1.85 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.84 02-Jul-1999  itojun avoid "variable not initialized" warnings on some of the platforms.
 1.83 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.82 23-May-1999  ad Add new sysctl (net.inet.tcp.log_refused) that when set, causes refused TCP
connections to be logged.
 1.81 03-May-1999  thorpej Fix an ininitialized variable that the MIPS compiler caught (but the
SPARC, Alpha, Arm, and i386 compilers missed).
 1.80 29-Apr-1999  thorpej Implement retransmit logic for the SYN cache engine. Fixes a rare condition
where one side can think a connection exists, where the other side thinks
the connection was never established.

The original problem was first reported by Ty Sarna in PR #5909. The
original fix I made to the code didn't cover all cases. The problem this
fix addresses was reported by Christoph Badura via private e-mail.

Many thanks to Bill Sommerfeld for helping me to test this code, and
for finding a subtle bug.
 1.79 22-Apr-1999  simonb Don't extern sb_max, <sys/socketvar.h> provides a definition.
 1.78 09-Apr-1999  kml Ensure that out of window SYNs receive an ACK in responce, rather than
being dropped. This fixes a bug reported by Jason Thorpe.
 1.77 05-Feb-1999  matt branches: 1.77.2;
According to Dave Borman, the iss should be using snd_nxt and not rcv_nxt
(from tcp_impl mailing-list).
 1.76 04-Feb-1999  explorer REALLY only update the window when we get an ACK. (the old code seemed broken)
 1.75 24-Jan-1999  thorpej * Completely rewrite syn_cache_respond().
- Don't use tcp_respond(), instead create the tcp/ip header from scratch,
and send it ourself.
- Reuse the mbuf that carried the SYN, or allocate one if that is not
available.
- Cache the route we look up to do the Path MTU Discovery check, and
transfer the reference to that route to the inpcb when the connection
completes.
* Macro'ize a small, but often repeated code fragment.
 1.74 19-Jan-1999  mycroft Don't screw with ip_len; just subtract from it where we actually use the
value.
 1.73 19-Jan-1999  mycroft Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.
 1.72 18-Dec-1998  thorpej Add a lock around the TCPCB's sequence queue, to prevent tcp_drain()
from corrupting the queue if called from a device's interrupt context.

Similar in nature to the problem reported in PR #5684.
 1.71 08-Oct-1998  thorpej Use the pool allocator for ipqent structures.
 1.70 06-Oct-1998  matt Fix boolean dyslexic test. Duh!
 1.69 06-Oct-1998  matt Add a sysctl for newreno (default to off).
 1.68 04-Oct-1998  matt Adapt the NEWRENO changes from the UCSB diffs of BSDI 3.0's TCP
to NetBSD. Ignore the SACK & FACK stuff for now.
 1.67 19-Sep-1998  mycroft Fix a typo (not mine) in a comment.
 1.66 19-Sep-1998  mycroft If we're in LISTEN state and all of RST, SYN and ACK are clear, send a RST.
 1.65 10-Sep-1998  mouse Create tcp.keepidle, tcp.keepintvl, tcp.keepcnt, tcp.slowhz sysctls.
 1.64 09-Sep-1998  thorpej Use an algorithm similar to that in tcp_notify() to determine if
syn_cache_unreach() should remove the entry, or just continue on.

Algorithm is to only remove the entry if we've had more than one unreach
error and have retransmitted 3 or more times. This prevents the following
scenario, as noted in PR #5909 (PR from Ty Sarna, scenario from
Charles Hannum):

* Host A sends a SYN.
* Host A retransmits the SYN.
* Host B gets the first SYN and sends a SYN-ACK.
* Host B gets the second SYN and sends a SYN-ACK.
* One of the SYN-ACK bounces with an
ICMP unreachable, causing the `SYN cache' entry to be
removed with no notification.
* Host A receives the other SYN-ACK, sends an ACK, and goes to
ESTABLISHED state.

Should fix PR #5909.
 1.63 02-Aug-1998  thorpej Use the pool allocator for syn_cache entries.
 1.62 17-Jul-1998  thorpej Clarify that we are using the Loss Window if a retransmission occurred
during the three-way handshake.
 1.61 02-Jun-1998  thorpej Add a comment explaining why we do _not_ ACK data that might accompany
a SYN (avoidance of a DoS attack).
 1.60 11-May-1998  thorpej Nuke TUBA per my note to tech-net; there's no reason to keep it around.
 1.59 07-May-1998  thorpej Rework the syn cache code somewhat:
- Don't use home-grown queue manipulation. Use <sys/queue.h> instead. The
data structures are a little larger, but we are otherwise wasting the
memory chunk anyway (we're already a 64-byte malloc bucket).
- Fix a bug in the cache-is-full case: if the oldest element removed from
the first non-empty bucket was the only element in the bucket, the
bucket wouldn't be removed from the bucket cache, causing queue corruption
later.
- Optimize the syn cache timers by using PRT timers rather than home-grown
decrement-and-propagate timers.

This code is now a fair bit smaller, and significantly easier to read
and understand.
 1.58 06-May-1998  thorpej Use macros from tcp_timer.h to manipulate TCP timers, so that their
implementation can be changed easily.
 1.57 03-May-1998  thorpej Once again, move a declaration for the benefit of TUBA (grumble).
 1.56 02-May-1998  thorpej Oops, move a variable declaration so TUBA won't lose.
 1.55 02-May-1998  thorpej Reintroduce the immediate ACK-on-PUSH behavior removed in revision 1.47,
but make the decision to do this dependent on the sysctl variable
net.inet.tcp.ack_on_push, which is disabled by default.
 1.54 29-Apr-1998  matt New TCP reassembly code. The new code reduces the memory needed by
out-of-order packets and builds the infrastructure needed for sending
SACK blocks (to be added shortly).
 1.53 29-Apr-1998  thorpej Change RFC1323 timestamp update rule per Section 3.4 of RFC1323.bis. Old
rule was to update the timestamp if the sequence numbers are in range. New
rule adds a check that the timestamp is advancing, thus preventing our notion
of the most recent timestamp from incorrectly moving backwards.
 1.52 28-Apr-1998  thorpej Log the peer's IP address on received window scale factors larger than
TCP_MAX_WINSHIFT (14), as recommended in Section 2.3 of RFC1323.
 1.51 13-Apr-1998  kml Fix to ensure that the correct MSS is advertised for loopback
TCP connections by using the MTU of the interface. Also added
a knob, mss_ifmtu, to force all connections to use the MTU of
the interface to calculate the advertised MSS.
 1.50 07-Apr-1998  thorpej Remember any source routes that may have accompanied a SYN.
 1.49 03-Apr-1998  thorpej Now that we have a flags word in the syn cache entry, use a flag to indicate
"peer will do timestamps" rather than a bitfield, and give the now-unsed
bit to the hash, making it now 32 bits.
 1.48 03-Apr-1998  thorpej Clean up some comments wrt. the syn cache code.
 1.47 31-Mar-1998  thorpej Back out a change made some time ago, that would cause the NetBSD TCP
to ACK immediately any packet that arrived with PSH set. This breaks
delayed ACKs in a few specific common cases that delayed ACKs were
supposed to help, and ends up not making much (if any) difference in
the case where where this ACK-on-PSH change was supposed to help.

Per discussion with several members of the TCPIMPL and TCPSAT IETF
working groups.
 1.46 31-Mar-1998  thorpej Fix a potential-congestion case in the larger initial congestion window
code, as clarified in the TCPIMPL WG meeting at IETF #41: If the SYN
(active open) or SYN,ACK (passive open) was retransmitted, the initial
congestion window for the first slow start of that connection must be
one segment.
 1.45 19-Mar-1998  kml Fix a retransmission bug introduced by the Brakmo and Peterson
RTO estimation changes. Under some circumstances it would return a value
of 0, while the old Van Jacobson RTO code would return a minimum of 3.
This would result in 12 retransmissions, each 1 second apart.
This takes care of those instances, and ensures that t_rttmin is
used everywhere as a lower bound.
 1.44 19-Feb-1998  thorpej Update copyright (sigh, should have done this long ago).
 1.43 24-Jan-1998  mellon Always set sc->sc_timeout (it was missed in one case). This fixes a problem where SYN cache entries are sometimes timed out almost immediately.
 1.42 24-Jan-1998  mycroft Fix an old editing error from merging a bug fix into Lite,
that might cause us to erroneously drop a FIN.
Also, minor changes so the code looks more like Stevens vol 2 figure 28.30.
 1.41 21-Jan-1998  mellon Never free the mbuf that we give to tcp_respond(). The previous change corrected an inconsistency but in exactly the wrong way.
 1.40 18-Jan-1998  mellon In syn_cache_get(), don't free incoming packet before jumping to resetandabort, but do free it after sending the reset.
 1.39 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.38 31-Dec-1997  thorpej Implement a queue for delayed ACK processing. This queue is used in
tcp_fasttimo() in lieu of scanning all open TCP connections.
 1.37 11-Dec-1997  thorpej Fix the "stretch ACK violation" bug documented in internet draft
draft-ietf-tcpimpl-prob-02.txt. Also, fix another bug in the header
prediction case where an ACK would not be sent when it should be.
 1.36 21-Nov-1997  thorpej Slight change to the previous: just drop the packet in the self-connect
case. Sending an RST to ourselves is a little silly, considering that
we'll just attempt to remove a non-existent compressed state entry and
then drop the packet anyway.
 1.35 21-Nov-1997  thorpej In tcp_input(), if the PCB we lookup for an incoming packet is a listen
socket:
- If we received a SYN,ACK, send an RST.
- If we received a SYN, and the connection attempt appears to come from
itself, send an RST, since it cannot possibly be valid.
 1.34 08-Nov-1997  kml TCP MSS fixes to provide cleaner slow-start and recovery.
 1.33 10-Oct-1997  explorer branches: 1.33.2;
Add hooks to use the kernel random system to generate TCP sequence numbers.
 1.32 22-Sep-1997  thorpej Fix several annoyances related to MSS handling in BSD TCP:
- Don't overload t_maxseg. Previous behavior was to set it to the min
of the peer's advertised MSS, our advertised MSS, and tcp_mssdflt
(for non-local networks). This breaks PMTU discovery running on
either host. Instead, remember the MSS we advertise, and use it
as appropriate (in silly window avoidance).
- Per last bullet, split tcp_mss() into several functions for handling
MSS (ours and peer's), and performing various tasks when a connection
becomes ESTABLISHED.
- Introduce a new function, tcp_segsize(), which computes the max size
for every segment transmitted in tcp_output(). This will eventually
be used to hook in PMTU discovery.
 1.31 28-Jul-1997  thorpej branches: 1.31.2;
Garbage-collect some "extern"s.
 1.30 28-Jul-1997  thorpej Fix a rather severe bug in handling of incoming SYNs for peer/port values
which happen to have a TCB in TIME_WAIT, where an mbuf which had been
advanced past the IP+TCP headers and TCP options would be reused as if
it had not been advanced. Problem found by Juergen Hannken-Illjes, who
also suggested a work-around on which this fix is based.
 1.29 23-Jul-1997  thorpej Pull SYN_cache_branch down into the main line.
 1.28 06-Jul-1997  thorpej Fix an old and obscure TCP bug, brought to my attention by Bill Fenner,
fixed in FreeBSD by John Polstra:

Fix a bug (apparently very old) that can cause a TCP connection to
be dropped when it has an unusual traffic pattern. For full details
as well as a test case that demonstrates the failure, see the
referenced PR (FreeBSD's kern/3998).

Under certain circumstances involving the persist state, it is
possible for the receive side's tp->rcv_nxt to advance beyond its
tp->rcv_adv. This causes (tp->rcv_adv - tp->rcv_nxt) to become
negative. However, in the code affected by this fix, that difference
was interpreted as an unsigned number by max(). Since it was
negative, it was taken as a huge unsigned number. The effect was
to cause the receiver to believe that its receive window had negative
size, thereby rejecting all received segments including ACKs. As
the test case shows, this led to fruitless retransmissions and
eventually to a dropped connection. Even connections using the
loopback interface could be dropped. The fix substitutes the signed
imax() for the unsigned max() function.

Bill informs me that his research indicates this bug appeared in Reno.
 1.27 10-Dec-1996  mycroft branches: 1.27.8;
Fix RTT scaling problems introduced with Brakmo and Peterson changes.
 1.26 15-Sep-1996  mycroft Hash unconnected PCBs.
 1.25 10-Sep-1996  mycroft If we're in SYN-SENT or SYN-RECEIVED state, don't reset the keepalive
timer until we transition to ESTABLISHED state. Suggested by TCP/IP
vol 3.
 1.24 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.23 13-Feb-1996  christos branches: 1.23.4;
netinet prototypes
 1.22 31-Jan-1996  mycroft Ignore FIN if not yet connected.
 1.21 31-Jan-1996  mycroft Build a hash table of PCBs. Hash function needs tweaking.
 1.20 21-Nov-1995  cgd make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.
 1.19 04-Aug-1995  mycroft branches: 1.19.2;
Encapsulate the test for sending a notification in a macro, sb_notify().
 1.18 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.17 11-Jun-1995  mycroft Oops. Decrement rtt earlier.
 1.16 11-Jun-1995  mycroft As suggested by Brakmo and Peterson:
* Don't add the extra 1/8 of the mss when ramping up the congestion window.
* Scale the RTT values slightly to adjust for rounding errors.
* Set the lower bound of the RTO to RTT+2.
 1.15 11-Jun-1995  mycroft Check for inflated congestion window during header prediction, per Bramko and
Peterson.
 1.14 04-Jun-1995  mycroft Clean up many more casts.
 1.13 01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.12 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.11 14-Oct-1994  mycroft Don't return received data to the user until the initial handshake is complete.
Also use TCPS_HAVEESTABLISHED() in a few other places.
 1.10 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8 25-Apr-1994  mycroft As I described this on comp.protocols.tcp-ip:

I've found a problem with the TCP delayed ack algorithm. If the writer's
buffer becomes full before sending an entire window, the writer will stop
and the ack will be delayed and the transmission will be stalled pending
a timeout on (and transmission of) the delayed ack.

As an experiment, I've applied the following patch to my (NetBSD) kernel,
and it alleviates the problem.

The worst case for this change is that the writer sets the PSH bit on
every outgoing packet, in which case delayed ack is effectively disabled.
This is not an issue of correctness, however, and since most vendors use
the PSH bit a bit more intelligently, it doesn't seem like a serious
problem.
 1.7 12-Apr-1994  mycroft Patch from James Carlson to fix TCP stalls.
 1.6 08-Jan-1994  mycroft Remove some extra prototypes.
 1.5 08-Jan-1994  mycroft Prototypes.
 1.4 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.3 18-Dec-1993  mycroft Canonicalize all #includes.
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.19.2.1 02-Feb-1996  mycroft Bring in changes for mondo patch 2.
 1.23.4.2 11-Dec-1996  mycroft From trunk:
If we're in SYN-SENT or SYN-RECEIVED state, don't reset the keepalive
timer until we transition to ESTABLISHED state.
 1.23.4.1 10-Dec-1996  mycroft From trunk:
Fix RTT scaling problems introduced with Brakmo and Peterson changes.
 1.27.8.26 16-Jul-1997  thorpej Rearrange things a bit so that TUBA compiles again.
 1.27.8.25 13-Jul-1997  thorpej Pay attention to tcp_do_rfc1323 when creating compressed state for a SYN.
 1.27.8.24 11-Jul-1997  thorpej If we have to abort the connection after the 3-way handshake is completed,
send an RST to the peer.
 1.27.8.23 11-Jul-1997  thorpej In the "received SYN for listen socket" case, correct an off-by-one
botch in the backlog test, which could cause our SYN to not be put in
compressed state.

XXX In a perfect world, we want to create compressed state for _all_ SYNs,
XXX even if the socket queue is full of legit connections, but doing so
XXX would break backlog semantics (because we'd have to drop the connection
XXX before the last ACK that completes the 3-way handshake, and we don't
XXX currently have a way to do that). Needs more thought.
 1.27.8.22 06-Jul-1997  thorpej Update from trunk.
 1.27.8.21 30-Jun-1997  thorpej Clean up the TODO list a bit.
 1.27.8.20 30-Jun-1997  thorpej Correct and document the semantics of error handling in syn_cache_get()
when creating the connection. syn_cache_get() now returns the following:
* NULL: We don't have a SYN for this ACK; send the peer an RST.
* -1: We are unable to create a socket for this connection. We have
sent an ACK,RST to the peer. Since the mbuf is being used to send
the response, caller should not free it.
* -2: Some other error occured while creating the connection, but we were
able to create a socket. In this case, caller should simply drop the
packet, and let the peer resend the ACK (this is the "abort and retry"
case).
* Else, return value is a pointer to the socket created for the new
connection.
 1.27.8.19 29-Jun-1997  thorpej In tcp_input():
- Perform the DIAGNOSTIC check for TCPS_LISTEN a bit earlier, and remove
a now-unnecessary check for TCPS_LISTEN state in the TCB.
 1.27.8.18 29-Jun-1997  thorpej In syn_cache_get():
- If we abort the connection (due to resource shortage or other error),
make sure notify tcp_input() upon return.
 1.27.8.17 29-Jun-1997  thorpej In tcp_input():
- Garbage-collect TCPS_LISTEN state from TCP input processing now that
all SYNs are handled with compressed state. Since we should never see
TCPS_LISTEN here, add a DIAGNOSTIC panic() if we encounter it.

In syn_cache_insert():
- Instrument hash collisions.

In syn_cache_get():
- Actually do all of the processing necessary to complete the connection
(not as far as the foreign host is concerned, but rather internal state
housekeeping). This fixes the last of the "weird" problems I have
encountered during normal use and under attack from multiple 10Mb/s
syn floods.

Also, add a few comments to clarify a few bits of code.
 1.27.8.16 28-Jun-1997  thorpej SYN cache can't ever be disabled since we use it for all incoming SYNs.
 1.27.8.15 28-Jun-1997  thorpej KNF.
 1.27.8.14 28-Jun-1997  thorpej In tcp_input(), always create compressed state when we receive a SYN,
rather than attempting to create a TCB, and using compressed state if
that fails. This makes the logic much simpler, removes some code
duplication, and kills the "temporary socket" hack that has historically
existed in SYN handling.

This change eliminates the need for SS_FORCE, since we only create
TCBs on legitimate connections. Previously, SS_FORCE was needed to
bypass the socket queue limit because legitimate connections might
have bee blocked out by bogus SYNs that weren't in compressed state.

Update TODO list appropriately. (This was a BIG one.)
 1.27.8.13 28-Jun-1997  thorpej Change some spaces -> tabs in indentation. (Hmm, looks like someone
uses _emacs_ :-)
 1.27.8.12 28-Jun-1997  thorpej In syn_cache_get(), set the state of the new inpcb to INP_BOUND.
If we don't do this, the PCB lookup will fail on subsequent packets
from the peer, causing an RST to be generated.

Couple of minor stylistic changes while I'm here.
 1.27.8.11 26-Jun-1997  thorpej KNF sweep over the syn cache code.
 1.27.8.10 26-Jun-1997  thorpej Remove unnecessary (and incorrect) htonl()s in the multicast test in
syn_cache_add().
 1.27.8.9 26-Jun-1997  thorpej Knock tcp_mss() lossage and type-size problems off the TODO list.
 1.27.8.8 26-Jun-1997  thorpej In sys_cache_respond():
- Fix type size problems, especially in creation of the timestamp option.
- Fix byte order problems in creation of the MSS option.
 1.27.8.7 26-Jun-1997  thorpej tcp_mss() needs to take a u_int, not a u_int16_t.
 1.27.8.6 26-Jun-1997  thorpej Casting pointers to u_int64_t isn't correct. Casting to u_long is fine
for arithmetic operations.
 1.27.8.5 26-Jun-1997  thorpej Oops, remove nested comment in the TODO list.
 1.27.8.4 26-Jun-1997  thorpej Add a TODO list, from Charles M. Hannum.
 1.27.8.3 28-May-1997  mellon Pointers are 64 bits on alpha - fix warning.
 1.27.8.2 14-May-1997  mellon SS_PRIV -> SS_FORCE
 1.27.8.1 14-May-1997  mellon Incorporate David Borman of BSDI's tcp SYN caching patches for
4.4BSD-lite2:

- define non-global syn cache variables
- define syn cache hashing algorithm

in tcp_input():

- package ts_val, ts_ecr and ts_present in a tcp_opt_info
structure so that they can be passed en masse to the
syn_cache code.

if the packet matched a socket that's in the ACCEPTING state:

- if an incoming connection does not yet have a tcpcb, but
it's not a SYN packet, check in the syn cache to see if we
cached the initial SYN. If not, send an RST packet. If
so, and if it's an RST packet, though, just blow away the
cache entry. If there was a cache entry and we aren't
processing an RST packet, create the full-blown connection
now and jump into the part of tcp_input() that deals with
connected sockets.

- if it is a SYN, and sonewconn() wouldn't queue it because
the limit for incoming half-up connections has been
exceeded, but the limit for established connections hasn't
yet been exceeded, then put this connection into the syn
cache.

after we've handled the accepting state:

- call tcp_dooptions with tcp_opt_info structure rather than
discrete option state variables.

- If the connection is half-up, and we get an ACK packet, but
it's not for the SYN we sent, drop the connection and send
an RST, per rfc793, p. 36.

in tcp_dooptions:

- combine all the option state passed as arguments into one
tcp_opt_info structure.

add syn cache management functions, verbatim from David's
patch:

syn_cache_insert: insert a connection into the SYN cache. If
we reach the per-bucket or cache size limit, toss the oldest
entry in the bucket, or if there are no entries in this
bucket yet, go looking for an entry to toss.

syn_cache_timer: blow away aging cache entries.

syn_cache_lookup: find the syn cache entry matching a
particular tcp packet, if any.

syn_cache_get: take an entry out of the cache and make a
socket for it.

syn_cache_reset: zap a connection in the syn cache based on
receipt of an RST packet.

syn_cache_unreach: zap a connection in the syn cache based on
an ICMP unreachable message.

syn_cache_add: given a LISTEN socket and an inbound SYN
request, add an entry to the syn cache and send a SYN,ACK to
the source.

syn_cache_respond: actually sends the SYN,ACK.
 1.31.2.2 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.31.2.1 29-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.33.2.6 09-May-1998  mycroft Pull up patch from kml.
 1.33.2.5 05-May-1998  mycroft Pull up 1.45, per request of kml.
 1.33.2.4 29-Jan-1998  mellon Pull up 1.37-1.37 (thorpej). Pull up 1.40-1.41 and 1.43 (mellon) Pull up 1.42 (mycroft)
 1.33.2.3 21-Nov-1997  thorpej Pull up from trunk: slight change to previous: don't send RST in the
self-connect case.
 1.33.2.2 21-Nov-1997  thorpej Pull up from trunk: send RST on SYNs that come from themselves, and
if we receive SYN,ACK on a LISTEN socket.
 1.33.2.1 08-Nov-1997  thorpej Pull up from trunk: TCP MSS fixes to provide cleaner slow-start and recovery.
(kml)
 1.77.2.3 03-May-1999  perry branches: 1.77.2.3.2; 1.77.2.3.4;
pullup 1.80->1.81 (thorpej)
 1.77.2.2 29-Apr-1999  perry sync to 1.80 (thorpej)
 1.77.2.1 09-Apr-1999  kml Pullup of 1.78, which fixes the stack so that out of window SYNs are ACKed.
 1.77.2.3.4.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.77.2.3.4.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.77.2.3.4.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.77.2.3.2.3 02-Aug-1999  thorpej Update from trunk.
 1.77.2.3.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.77.2.3.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.96.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.96.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.96.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.96.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.96.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.108.4.15 20-Apr-2004  jmc Pullup patch (requested by itojun in ticket #143)

If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it. Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window. If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
Respond to RST by ACK, as suggested in NISCC recommendation.
Rate-limit ACKs against RSTs and SYNs.
If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
 1.108.4.14 18-Oct-2002  itojun sys/netinet/tcp_input.c 1.156 via patch

Correct log_refused check (TH_SYN, !TH_RST and !TH_ACK). Fixes PR#18669.

(itojun)
 1.108.4.13 05-Sep-2002  itojun sys/netinet/tcp_input.c 1.154
always consult SS_CANTRCVMORE. PR 18185

(itojun)
 1.108.4.12 03-Apr-2002  he Pull up revisions 1.138-1.140 (via patch, requested by itojun):
Reject TCP SYN packets sent to the broadcast address.
 1.108.4.11 20-Mar-2002  he Pull up revision 1.136 (requested by itojun):
Support tcp_log_refused for IPv6.
 1.108.4.10 24-Jan-2002  he Pull up revision 1.134 (requested by itojun):
Clean up the NRL copyright.
 1.108.4.9 09-May-2001  he Pull up revision 1.124 (requested by itojun):
Correct faith prefix determintaion.
 1.108.4.8 06-Apr-2001  he Pull up revision 1.122 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.108.4.7 16-Aug-2000  itojun pullup (approved by releng-1-5)

switch from net.inet*.*.*ratelimit to net.inet*.*.ppslimit.

(tags are rough estimate - we had some try-and-error in main trunc)
sys/netinet/icmp6.h 1.9 -> 1.11
sys/netinet/icmp_var.h 1.15 -> 1.17
sys/netinet/in_proto.c 1.39 -> 1.42
sys/netinet/ip_icmp.c 1.50 -> 1.51, 1.52 -> 1.54
sys/netinet/tcp_input.c 1.111 -> 1.112, 1.115 -> 1.117
sys/netinet/tcp_usrreq.c 1.52 -> 1.53
sys/netinet/tcp_var.h 1.72 -> 1.75
sys/netinet6/icmp6.c 1.34 -> 1.35, 1.36 -> 1.38
sys/netinet6/in6_proto.c 1.17 -> 1.19
 1.108.4.6 28-Jul-2000  itojun pullup 1.114 -> 1.115 (approved by releng-1-5)

> be proactive about unspecified IPv6 source address. pcb layer uses
> unspecified address (::) to mean "unbounded" or "unconnected",
> and can be confused by packets from outside.
>
> use of :: as source is not documented well in IPv6 specification.
>
> not sure if it presents a real threat. the worst case scenario is a DoS
> against TCP listening socket:
> - - outsider transmit TCP SYN with :: as IPv6 source
> - - receiving side creates TCP control block with:
> local address = my addres
> remote address = :: (meaning "unconnected")
> state = SYN_RCVD
> note that SYN ACK will not be sent due to ip6_output() filter.
> this stays until it timeouts.
> - - the TCP control block prevents listening TCP control block from
> being contacted (DoS).
>
> udp6/raw6 socket may have similar problem, but as they are connectionless,
> it may too much to filter it out.
 1.108.4.5 23-Jul-2000  itojun pullup 1.113 -> 1.114 (approved by releng-1-5)
add an DIAGNOSTIC case for MCLBYTES assumption
 1.108.4.4 23-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

remove old mbuf assumption (ip header and tcp header are on the same mbuf).
this is for m_pulldown use. (sync with kame)

1.108 -> 1.109 syssrc/sys/netinet/tcp_input.c
1.56 -> 1.57 syssrc/sys/netinet/tcp_output.c
1.91 -> 1.92 syssrc/sys/netinet/tcp_subr.c
 1.108.4.3 20-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)
- add protection mechanism against ND cache corruption due to bad NUD hints.

this is part of:
sys/netinet/icmp6.h 1.9 -> 1.10
sys/netinet/tcp_input.c 1.111 -> 1.112
sys/netinet6/icmp6.c 1.34 -> 1.35
sys/netinet6/nd6.c 1.30 -> 1.31
sys/netinet6/nd6.h 1.14 -> 1.15
 1.108.4.2 16-Jul-2000  itojun pullup 1.112 -> 1.113 (approved by releng-1-5)
date: 2000/07/09 12:49:08; author: itojun; state: Exp; lines: +4 -2
be more cautious about tcp option length field. drop bogus ones earlier.
not sure if there is a real threat or not, but it seems that there's
possibility for overrun/underrun (like non-NOP option with optlen > cnt).
 1.108.4.1 03-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)
repair kernel faithd(8) support. there were two mistakes:
(1) tcp6_input dropped packets for translation
(2) in6_pcblookup_connect was too strict
 1.122.2.16 07-Jan-2003  thorpej Sync with HEAD.
 1.122.2.15 11-Nov-2002  nathanw Catch up to -current
 1.122.2.14 22-Oct-2002  thorpej Sync with HEAD.
 1.122.2.13 18-Oct-2002  nathanw Catch up to -current.
 1.122.2.12 17-Sep-2002  nathanw Catch up to -current.
 1.122.2.11 28-Aug-2002  thorpej Sync with -current.
 1.122.2.10 27-Aug-2002  nathanw Catch up to -current.
 1.122.2.9 01-Aug-2002  nathanw Catch up to -current.
 1.122.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.122.2.7 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.122.2.6 28-Feb-2002  nathanw Catch up to -current.
 1.122.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.122.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.122.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.122.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.122.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.127.4.1 01-Oct-2001  fvdl Catch up with -current.
 1.127.2.7 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.127.2.6 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.127.2.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.127.2.4 16-Mar-2002  jdolecek Catch up with -current.
 1.127.2.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.127.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.127.2.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.131.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.141.4.8 20-Apr-2004  jmc Pullup patch (requested by itojun in ticket #1680)

If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it. Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window. If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
Respond to RST by ACK, as suggested in NISCC recommendation.
Rate-limit ACKs against RSTs and SYNs.
If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
 1.141.4.7 22-Oct-2003  jmc Pullup rev 1.173 (requested by he in ticket #1530)


Introduce a new INVOKING status for callouts, and use it to close
a race condition in the TCP code. Fixes PR#20390.
 1.141.4.6 17-Jun-2003  msaitoh Pull up revisions 1.168 (requested by itojun in ticket #1304):
inherit IPV6_V6ONLY from listening socket. Fixes PR#21713.
 1.141.4.5 23-Oct-2002  lukem Pull up revision 1.158 (requested by thorpej in ticket #938):
Make sure TF_REQ_TSTMP and TF_REQ_SCALE get set correctly in the new
TCPCB in the passive-open case.
Fixes PR 18677.
 1.141.4.4 21-Oct-2002  lukem Pull up revision 1.156 via patch (requested by itojun in ticket #915):
correct log_refused check (TH_SYN, !TH_RST and !TH_ACK). PR 18669
 1.141.4.3 06-Sep-2002  lukem Pull up revision 1.154 via patch (requested by itojun in ticket #775):
always consult SS_CANTRCVMORE. PR 18185
 1.141.4.2 28-Aug-2002  lukem Pull up revision 1.153 (requested by thorpej in ticket #738):
Fix a problem introduced in rev 1.103, where we recycle a TIME_WAIT
TCPCB .. the fields need to be converted back to net-order, because
the packet is checksummed after the TCPCB lookup happens.
From YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>.
 1.141.4.1 21-Jul-2002  lukem Pull up revision 1.149 (requested by wrstuden in ticket #517):
When a new connection arrives on a listening port, copy over the
value of the TCP_NODELAY socket option from the listener to the
newly connected connection. Agrees with how Linux & FreeBSD behave,
and goes more with the spirit of accept(2) creating a socket with
the same properties as the listener.
Analysis by Kevin Lahey. Closes PR 17616 by myself.
 1.141.2.5 29-Aug-2002  gehenna catch up with -current.
 1.141.2.4 20-Jul-2002  gehenna catch up with -current.
 1.141.2.3 15-Jul-2002  gehenna catch up with -current.
 1.141.2.2 20-Jun-2002  gehenna catch up with -current.
 1.141.2.1 30-May-2002  gehenna Catch up with -current.
 1.171.2.11 11-Dec-2005  christos Sync with head.
 1.171.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.171.2.9 01-Apr-2005  skrll Sync with HEAD.
 1.171.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.171.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.171.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.171.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.171.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.171.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.171.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.171.2.1 03-Aug-2004  skrll Sync with HEAD
 1.190.2.8 22-Apr-2005  tron Apply patch (requested by christos in ticket #1445):
Fix TCP performance problems introduced in ticket 1401.
 1.190.2.7 06-Apr-2005  tron Apply patch (requested by christos in ticket #1401):
If an echoed RFC 1323 time stamp appears to be later than the current time,
ignore it and fall back to old-style RTT calculation. This prevents ending
up with a negative RTT and panicking later.
 1.190.2.6 19-Sep-2004  he branches: 1.190.2.6.2;
Apply patch (requested by yamt in ticket #861):
Fix this so it compiles again; we cannot use the link
set macros for pool initialization on this release branch.
 1.190.2.5 18-Sep-2004  he Pull up revision 1.209 (requested by yamt in ticket #861):
Fix ipqent pool corruption problems. Make the TCP reassembly
code use its own pool of ipqent rather than sharing it with
the IP reassembly code. Fixes PR#24782.
 1.190.2.4 10-May-2004  tron Pull up revision 1.205 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.190.2.3 29-Apr-2004  jmc Pullup rev 1.204 (requested by matt in ticket #213)

When a packet is received that overlaps the left side of the window,
check for RST *before* trimming data and adjust its sequence number.
 1.190.2.2 20-Apr-2004  jmc Pullup patch (requested by itojun in ticket #169)

If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it. Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window. If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
Respond to RST by ACK, as suggested in NISCC recommendation.
Rate-limit ACKs against RSTs and SYNs.
If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
 1.190.2.1 16-Apr-2004  tron Pull up revision 1.192 (requested by ragge in ticket #140):
Add back one line which was accidentially removed (by me) a while ago.
Spotted by Markus Friedl (markus at openbsd.org).
 1.190.2.6.2.2 22-Apr-2005  tron Apply patch (requested by christos in ticket #1445):
Fix TCP performance problems introduced in ticket 1401.
 1.190.2.6.2.1 06-Apr-2005  tron Apply patch (requested by christos in ticket #1401):
If an echoed RFC 1323 time stamp appears to be later than the current time,
ignore it and fall back to old-style RTT calculation. This prevents ending
up with a negative RTT and panicking later.
 1.212.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.212.4.1 12-Feb-2005  yamt sync with head.
 1.212.2.1 29-Apr-2005  kent sync with -current
 1.224.2.3 26-Oct-2006  ghen Pull up following revision(s) (requested by rpaulo in ticket #1547):
sys/netinet/tcp_input.c: revision 1.250
PR 34776: don't accept TCP connections to broadcast addresses.
Move the multicast/broadcast check above (before creating a
syn_cache entry)
By Yasuoka Yasuoka.
 1.224.2.2 28-Apr-2005  tron branches: 1.224.2.2.2;
Pull up revision 1.227 (requested by manu in ticket #206):
Fix build problem after recent NAT-T changes
 1.224.2.1 04-Apr-2005  tron Pull up revision 1.225 (requested by yamt in ticket #90):
protect tcpipqent with splvm.
 1.224.2.2.2.1 26-Oct-2006  ghen Pull up following revision(s) (requested by rpaulo in ticket #1547):
sys/netinet/tcp_input.c: revision 1.250
PR 34776: don't accept TCP connections to broadcast addresses.
Move the multicast/broadcast check above (before creating a
syn_cache entry)
By Yasuoka Yasuoka.
 1.230.2.10 17-Mar-2008  yamt sync with head.
 1.230.2.9 27-Feb-2008  yamt sync with head.
 1.230.2.8 11-Feb-2008  yamt sync with head.
 1.230.2.7 04-Feb-2008  yamt sync with head.
 1.230.2.6 21-Jan-2008  yamt sync with head
 1.230.2.5 15-Nov-2007  yamt sync with head.
 1.230.2.4 03-Sep-2007  yamt sync with head.
 1.230.2.3 26-Feb-2007  yamt sync with head.
 1.230.2.2 30-Dec-2006  yamt sync with head.
 1.230.2.1 21-Jun-2006  yamt sync with head.
 1.236.6.1 22-Nov-2005  yamt sync with head.
 1.237.4.3 09-Sep-2006  rpaulo sync with head
 1.237.4.2 05-Feb-2006  rpaulo Adapt to in6pcb -> inpcb changes.
 1.237.4.1 05-Feb-2006  rpaulo <netinet6/in6_pcb.h> went away. Bye!
 1.237.2.2 01-Mar-2006  yamt sync with head.
 1.237.2.1 18-Feb-2006  yamt sync with head.
 1.238.2.2 22-Apr-2006  simonb Sync with head.
 1.238.2.1 04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.239.6.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.239.4.1 19-Apr-2006  elad sync with head.
 1.239.2.3 14-Sep-2006  yamt sync with head.
 1.239.2.2 26-Jun-2006  yamt sync with head.
 1.239.2.1 24-May-2006  yamt sync with head.
 1.240.2.1 19-Jun-2006  chap Sync with head.
 1.244.4.3 18-Dec-2006  yamt sync with head.
 1.244.4.2 10-Dec-2006  yamt sync with head.
 1.244.4.1 22-Oct-2006  yamt sync with head
 1.244.2.2 12-Jan-2007  ad Sync with head.
 1.244.2.1 18-Nov-2006  ad Sync with head.
 1.255.4.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.255.2.2 25-May-2007  pavel Pull up following revision(s) (requested by riz in ticket #670):
sys/netinet/tcp_input.c: revision 1.265
sys/netinet/tcp_input.c: revision 1.266
sys/arch/i386/conf/ALL: revision 1.98
Revert a small part of revision 1.254 - remove const qualifier from
the struct tcphdr * argument of tcp_dooptions(). RFC2385 support
(options TCP_SIGNATURE) needs to modify the header during options
processing, and this revision broke it.
OK yamt@.

Fix compilation in the TCP_SIGNATURE case:
- don't use void * for pointer arithmetic
- don't try to modify const parameters
A kernel with 'options TCP_SIGNATURE' works as well as it ever did, now.
(ie, clunky, but passable)

Add 'options TCP_SIGNATURE' to hopefully keep this code from
invisibly breaking periodically, as it's done a couple times.
 1.255.2.1 24-May-2007  pavel Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.260.2.4 07-May-2007  yamt sync with head.
 1.260.2.3 24-Mar-2007  yamt sync with head.
 1.260.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.260.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.262.2.5 20-Aug-2007  ad Sync with HEAD.
 1.262.2.4 15-Jul-2007  ad Sync with head.
 1.262.2.3 01-Jul-2007  ad Adapt to callout API change.
 1.262.2.2 08-Jun-2007  ad Sync with head.
 1.262.2.1 13-Mar-2007  ad Sync with head.
 1.263.2.1 11-Jul-2007  mjf Sync with head.
 1.268.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.270.10.2 02-Aug-2007  yamt our tcp timestamps are in PR_SLOWHZ, not HZ.
 1.270.10.1 02-Aug-2007  yamt file tcp_input.c was added on branch matt-mips64 on 2007-08-02 13:06:31 +0000
 1.270.8.1 13-Nov-2007  bouyer Sync with HEAD
 1.270.4.3 23-Mar-2008  matt sync with HEAD
 1.270.4.2 09-Jan-2008  matt sync with HEAD
 1.270.4.1 06-Nov-2007  matt sync with HEAD
 1.270.2.2 11-Nov-2007  joerg Sync with HEAD.
 1.270.2.1 04-Nov-2007  jmcneill Sync with HEAD.
 1.271.2.3 18-Feb-2008  mjf Sync with HEAD.
 1.271.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.271.2.1 19-Nov-2007  mjf Sync with HEAD.
 1.272.6.2 19-Jan-2008  bouyer Sync with HEAD
 1.272.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.272.2.1 26-Dec-2007  ad Sync with head.
 1.280.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.280.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.280.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.280.2.1 24-Mar-2008  keiichi sync with head.
 1.284.2.1 18-May-2008  yamt sync with head.
 1.286.2.7 11-Aug-2010  yamt sync with head.
 1.286.2.6 11-Mar-2010  yamt sync with head
 1.286.2.5 16-Sep-2009  yamt sync with head
 1.286.2.4 19-Aug-2009  yamt sync with head.
 1.286.2.3 18-Jul-2009  yamt sync with head.
 1.286.2.2 04-May-2009  yamt sync with head.
 1.286.2.1 16-May-2008  yamt sync with head.
 1.287.4.3 31-Jul-2008  simonb Sync with head.
 1.287.4.2 18-Jul-2008  simonb Sync with head.
 1.287.4.1 03-Jul-2008  simonb Sync with head.
 1.287.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.289.2.1 19-Oct-2008  haad Sync with HEAD.
 1.291.8.1 18-Jul-2009  snj branches: 1.291.8.1.2;
Pull up following revision(s) (requested by is in ticket #859):
sys/netinet/tcp_input.c: revision 1.296
Follow exactly the recommendation of draft-ietf-tcpm-tcpsecure-11.txt:
Don't check gainst the last ack received, but the expected sequence number.
This makes RST handling independent of delayed ACK. From Joanne M Mikkelson.
 1.291.8.1.2.2 20-May-2011  matt bring matt-nb5-mips64 up to date with netbsd-5-1-RELEASE (except compat).
 1.291.8.1.2.1 21-Apr-2010  matt sync to netbsd-5
 1.291.4.6 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1973):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.291.4.5 11-Jun-2010  riz branches: 1.291.4.5.2; 1.291.4.5.6;
Pull up following revision(s) (requested by bouyer in ticket #1382):
sys/netinet/tcp_input.c: revision 1.304
sys/netinet/tcp_input.c: revision 1.305
syn_cache_put(): defer all pool_put() to the callout. Reschedule
the callout if needed so frees are not delayed too much.
syn_cache_timer(): we can't call syn_cache_put() here any more,
so move code deleted from syn_cache_put() here.
Avoid KASSERT() in kern_timeout.c because pool_put() is called from
ipintr context, as reported in
http://mail-index.netbsd.org/tech-kern/2010/03/19/msg007762.html
Thanks to Andrew Doran and Mindaugas Rasiukevicius for help and review.
Make sure SYN_CACHE_TIMER_ARM() has been run before calling syn_cache_put()
as it will reschedule the timer. Fixes PR kern/43318.
 1.291.4.4 20-May-2010  snj Revert ticket 1382.
 1.291.4.3 20-May-2010  snj Pull up following revision(s) (requested by bouyer in ticket #1382):
sys/netinet/tcp_input.c: revision 1.304
syn_cache_put(): defer all pool_put() to the callout. Reschedule
the callout if needed so frees are not delayed too much.
syn_cache_timer(): we can't call syn_cache_put() here any more,
so move code deleted from syn_cache_put() here.
Avoid KASSERT() in kern_timeout.c because pool_put() is called from
ipintr context, as reported in
http://mail-index.netbsd.org/tech-kern/2010/03/19/msg007762.html
Thanks to Andrew Doran and Mindaugas Rasiukevicius for help and review.
 1.291.4.2 26-Sep-2009  snj Pull up following revision(s) (requested by darran in ticket #950):
sys/netinet/tcp_input.c: revision 1.299
sys/netinet/tcp_usrreq.c: revision 1.156
sys/netinet/tcp_var.h: revision 1.161
Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl.
Okayed by tls@.
 1.291.4.1 18-Jul-2009  snj Pull up following revision(s) (requested by is in ticket #859):
sys/netinet/tcp_input.c: revision 1.296
Follow exactly the recommendation of draft-ietf-tcpm-tcpsecure-11.txt:
Don't check gainst the last ack received, but the expected sequence number.
This makes RST handling independent of delayed ACK. From Joanne M Mikkelson.
 1.291.4.5.6.1 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1973):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.291.4.5.2.1 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1973):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.291.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.291.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.292.2.2 23-Jul-2009  jym Sync with HEAD.
 1.292.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.300.4.4 31-May-2011  rmind sync with head
 1.300.4.3 21-Apr-2011  rmind sync with head
 1.300.4.2 05-Mar-2011  rmind sync with head
 1.300.4.1 30-May-2010  rmind sync with head
 1.300.2.2 17-Aug-2010  uebayasi Sync with HEAD.
 1.300.2.1 30-Apr-2010  uebayasi Sync with HEAD.
 1.306.2.1 06-Jun-2011  jruoho Sync with HEAD.
 1.317.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.317.2.1 17-Apr-2012  yamt sync with head
 1.318.2.3 29-Apr-2012  mrg sync to latest -current.
 1.318.2.2 05-Apr-2012  mrg sync to latest -current.
 1.318.2.1 18-Feb-2012  mrg merge to -current.
 1.321.8.1 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1315):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.321.6.1 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1315):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.321.2.1 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1315):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.325.2.3 03-Dec-2017  jdolecek update from HEAD
 1.325.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.325.2.1 23-Jun-2013  tls resync from head
 1.327.2.3 18-May-2014  rmind sync with head
 1.327.2.2 23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.327.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.331.2.1 10-Aug-2014  tls Rebase.
 1.334.4.8 28-Aug-2017  skrll Sync with HEAD
 1.334.4.7 05-Feb-2017  skrll Sync with HEAD
 1.334.4.6 05-Dec-2016  skrll Sync with HEAD
 1.334.4.5 09-Jul-2016  skrll Sync with HEAD
 1.334.4.4 19-Mar-2016  skrll Sync with HEAD
 1.334.4.3 22-Sep-2015  skrll Sync with HEAD
 1.334.4.2 06-Jun-2015  skrll Sync with HEAD
 1.334.4.1 06-Apr-2015  skrll Sync with HEAD
 1.334.2.2 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #886):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.334.2.1 21-Feb-2015  martin Pull up following revision(s) (requested by he in ticket #530):
sys/netinet/tcp_output.c: revision 1.180
sys/netinet/tcp_input.c: revision 1.336
sys/netinet/tcp_usrreq.c: revision 1.203
share/man/man4/tcp.4: revision 1.30
sys/netinet/tcp.h: revision 1.31
sys/netinet/tcp_subr.c: revision 1.258
sys/netinet/tcp_var.h: revision 1.176
sys/netinet/tcp_var.h: revision 1.177
sys/sys/param.h: bump revision

Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).

Change the new counter variables in struct tcpcb to uint32_t, as
per christos' comments.
 1.347.2.3 26-Apr-2017  pgoyette Sync with HEAD
 1.347.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.347.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.353.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.357.4.7 13-Sep-2020  martin Pull up following revision(s) (requested by kardel in ticket #1604):

sys/netinet/tcp_input.c: revision 1.420

PR/kern 55567
fix the data-only fast path. RCV.UP and SND.WL1 could be left behind
on long sequences of data only packets. pull them along to avoid relative
sequence wraps.
consistent with FreeBSD

addresses second failure mode of PR/kern 55567.
pullup to netbsd-8
pullup to netbsd-9
 1.357.4.6 03-Sep-2020  martin Pull up following revision(s) (requested by kardel in ticket #1602):

sys/netinet/tcp_input.c: revision 1.419

Fix fast path for uni directional transfers

pure ACK case:
drag snd_wl2 along so only newer
ACKs can update the window size.

also avoids the state where snd_wl2
is eventually larger than th_ack and thus
blocking the window update mechanism and
the connection gets stuck for a loooong
time in the zero sized send window state.

see PR/kern 55567

ok thorpej@, also found in FreeBSD
 1.357.4.5 08-Jul-2020  martin Apply patch, requested by christos in ticket #1566:

Deduplicate more code and avoid use of uninitialized variables.
 1.357.4.4 07-Jul-2020  martin Pull up following revision(s) (requested by christos in ticket #1566):

sys/netinet/tcp_input.c: revision 1.418 (via patch)

- always set both ip and ip6, otherwise a kernel assertion can be triggered
- move alignment early so that we do less work
 1.357.4.3 30-Mar-2018  martin Pull up following revision(s) (requested by maxv in ticket #662):

sys/netinet/tcp_input.c: revision 1.383 (via patch)

Revert rev1.183 (2003).

It was intended as an optimization, but it increases the attack surface:

the IPsec policy is not enforced on RST packets when the socket is in the
LISTEN state, and an (unauthenticated) attacker could jam the connection
between two IPsec hosts by sending RST packets between the client's SYN
and ACK packets.

Discussed with ozaki-r@.
 1.357.4.2 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.357.4.1 21-Jun-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #51):
sys/netinet/tcp_input.c: revision 1.358
tests/net/ipsec/t_ipsec_misc.sh: revision 1.7
Fix KASSERT in tcp_input
inp can be NULL when receiving an IPv4 packet on an IPv4-mapped IPv6
address. In that case KASSERT(sotoinpcb(so) == inp) always fails.
Should fix PR kern/52304 (at least it fixes the same panic as the
report)
--
Add test cases of TCP/IPsec on an IPv4-mapped IPv6 address
It reproduces the same panic reported in PR kern/52304
(but not sure that its cause is also same).
 1.383.2.8 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.383.2.7 30-Sep-2018  pgoyette Ssync with HEAD
 1.383.2.6 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.383.2.5 21-May-2018  pgoyette Sync with HEAD
 1.383.2.4 02-May-2018  pgoyette Synch with HEAD
 1.383.2.3 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.383.2.2 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.383.2.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.408.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.408.2.1 10-Jun-2019  christos Sync with HEAD
 1.414.2.5 08-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #1894):

sys/netinet/tcp_input.c: revision 1.441

tcp_reass: Mitigate CVE-2018-6922 (SegmentSmack)
at a level of FreeBSD, by introducing an arbitrary (100) limit to
the length of TCP reassembly queues:
https://github.com/freebsd/freebsd-src/commit/95a914f6316874f5b0c45d491f2843dc810071ef

Originally authored by ryo@.

We thank Tomoyuki Sahara <tsahara at iij>, who has analyzed the
problem again, updated the patch, and carried out experiments for
vulnerability scenarios. The confidential PR below is based on
his work.

PR security/58708
 1.414.2.4 13-Sep-2020  martin Pull up following revision(s) (requested by kardel in ticket #1081):

sys/netinet/tcp_input.c: revision 1.420

PR/kern 55567
fix the data-only fast path. RCV.UP and SND.WL1 could be left behind
on long sequences of data only packets. pull them along to avoid relative
sequence wraps.
consistent with FreeBSD

addresses second failure mode of PR/kern 55567.
pullup to netbsd-8
pullup to netbsd-9
 1.414.2.3 03-Sep-2020  martin Pull up following revision(s) (requested by kardel in ticket #1074):

sys/netinet/tcp_input.c: revision 1.419

Fix fast path for uni directional transfers

pure ACK case:
drag snd_wl2 along so only newer
ACKs can update the window size.

also avoids the state where snd_wl2
is eventually larger than th_ack and thus
blocking the window update mechanism and
the connection gets stuck for a loooong
time in the zero sized send window state.

see PR/kern 55567

ok thorpej@, also found in FreeBSD
 1.414.2.2 07-Jul-2020  martin Pull up following revision(s) (requested by christos in ticket #985):

sys/netinet/tcp_input.c: revision 1.418

- always set both ip and ip6, otherwise a kernel assertion can be triggered
- move alignment early so that we do less work
 1.414.2.1 10-Sep-2019  martin Pull up following revision(s) (requested by maxv in ticket #193):

sys/netinet/tcp_timer.h: revision 1.30
sys/netinet/tcp_input.c: revision 1.415
sys/netinet/tcp_usrreq.c: revision 1.225
sys/netinet/tcp_subr.c: revision 1.283

Clamp tcp timer quantities to reasonable ranges.
 1.424.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.428.4.1 01-Aug-2021  thorpej Sync with HEAD.
 1.438.2.1 08-Oct-2024  martin Pull up following revision(s) (requested by rin in ticket #932):

sys/netinet/tcp_input.c: revision 1.441

tcp_reass: Mitigate CVE-2018-6922 (SegmentSmack)
at a level of FreeBSD, by introducing an arbitrary (100) limit to
the length of TCP reassembly queues:
https://github.com/freebsd/freebsd-src/commit/95a914f6316874f5b0c45d491f2843dc810071ef

Originally authored by ryo@.

We thank Tomoyuki Sahara <tsahara at iij>, who has analyzed the
problem again, updated the patch, and carried out experiments for
vulnerability scenarios. The confidential PR below is based on
his work.

PR security/58708
 1.439.2.1 02-Aug-2025  perseant Sync with HEAD
 1.222 08-Sep-2024  rillig fix a/an grammar in obvious cases
 1.221 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.220 29-Jun-2024  riastradh branches: 1.220.2;
netinet: Use _NET_STAT* API instead of direct array access.

PR kern/58380
 1.219 13-Sep-2023  bouyer Handle EHOSTDOWN the same way as EHOSTUNREACH and ENETDOWN for established
connections. Avoid premature end of tcp connection with "Host is down" error
in case of transient link-layer failure.
Discussed and patch proposed in
http://mail-index.netbsd.org/tech-net/2023/09/11/msg008610.html
and followups.
 1.218 04-Nov-2022  ozaki-r branches: 1.218.2;
inpcb: rename functions to in6pcb_*
 1.217 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.216 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.215 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.214 30-Dec-2021  andvar s/bandwith/bandwidth/
 1.213 12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.212 17-Nov-2019  mlelstv Don't allow zero sized segments that will panic the stack.
Reported-by: syzbot+5542516fa4afe7a101e6@syzkaller.appspotmail.com
 1.211 25-Feb-2019  maxv Improve panic messages.
 1.210 27-Dec-2018  maxv Remove unused arguments.
 1.209 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.208 17-May-2018  maxv branches: 1.208.2;
Remove reference to tcpiphdr in comment.
 1.207 07-May-2018  uwe Fix unsigned wraparound on window size calculations.

This is another instance where tp->rcv_adv - tp->rcv_nxt can wrap
around after successful zero-window probe from the peer. The first
one was fixed by chs@ in revision 1.112 on 2004-05-08.

While here, CSE and de-obfuscate the code a bit.
 1.206 03-May-2018  maxv Remove now unused tcpip.h includes. Some were already unused before.
 1.205 03-Apr-2018  maxv bcopy -> memcpy, it's obvious the areas don't overlap.
 1.204 01-Apr-2018  maxv Change the check to be <= instead of <. This fixes one occurrence of an
apparently widespread division-by-zero bug in our TCP code: if a user adds
huge IPv6 options with setsockopt, and if the total size of the options
happens to be equal to the available space calculated for the TCP payload,
t_segsz gets set to zero, and given that we then divide several things by
it, the kernel crashes.
 1.203 01-Apr-2018  maxv Reorder and style, for clarity.
 1.202 30-Mar-2018  maxv Remove dead code. It was introduced in rev1 (25 years ago), and is
irrelevant today.
 1.201 30-Mar-2018  maxv Style, use NULL for pointers, use KASSERT, and don't inline huge functions,
we want to debug them with DDB (and not just with GPROF).
 1.200 29-Mar-2018  maxv Remove #ifdef INET. Same as tcp_input.c. Makes the code easier to
understand.

Also make tcp6_mtudisc() static in tcp_subr.c.
 1.199 10-Mar-2018  khorben Fix spello in a comment
 1.198 12-Feb-2018  maxv branches: 1.198.2;
Remove unused argument from tcp_signature_getsav.
 1.197 03-Aug-2017  ozaki-r Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
 1.196 02-Jun-2017  ozaki-r branches: 1.196.2;
Assert inph_locked on ipsec_pcb_skip_ipsec (was IPSEC_PCB_SKIP_IPSEC)

The assertion confirms SP caches are accessed under inph lock (solock).
 1.195 03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.194 04-Jan-2017  martin branches: 1.194.2;
Fix optlen calculation for the SACK block - 2 bytes too few were
calculated, causing corruption in PR kern/51767.
 1.193 04-Jan-2017  kre Remove redundant tests: if optlen === 0, then optlen % 4 != 2 (it is 0)
so there is no need to test both.
 1.192 03-Jan-2017  christos use symbolic constants; no functional change.
 1.191 03-Jan-2017  christos put it the way we had it before; since we check for the resulting size after
we added the extra space we can be equal to the size of the buffer.
 1.190 03-Jan-2017  christos fix off-by-one
 1.189 02-Jan-2017  christos make sure that the reset label is defined without TCP_SIGNATURE.
 1.188 02-Jan-2017  christos Fix TCP signature code:
1. pack options more tightly instead of being generous with no/op
2. put TCP_SIGNATURE option before SACK
3. fix computation of options length, by deferring it
XXX: Really we should move the options setting code in one place instead
of having two copies one for input and one for output.
XXX: tcp_optlen/tcp_hdrsiz need to be fixed; they were wrong before too.
 1.187 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.186 10-Jun-2016  ozaki-r branches: 1.186.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.185 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.184 24-Jul-2015  matt If we are sending a window probe and there's unacked data in the socket, make
sure at least the persist timer is running.
 1.183 16-May-2015  kefren Don't put segment on the wire if security request can't be fulfilled
 1.182 27-Apr-2015  christos Apply Revision 220794 from FreeBSD to avoid dup ACKs:

When checking to see if a window update should be sent to the remote peer,
don't force a window update if the window would not actually grow due to
window scaling. Specifically, if the window scaling factor is larger than
2 * MSS, then after the local reader has drained 2 * MSS bytes from the
socket, a window update can end up advertising the same window. If this
happens, the supposed window update actually ends up being a duplicate ACK.
This can result in an excessive number of duplicate ACKs when using a
higher maximum socket buffer size.

Pointed out by Ricky Charlet, in tech-net.
 1.181 27-Apr-2015  ozaki-r Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp

It consolidates a scattered routine:
(rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
 1.180 14-Feb-2015  he Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).
 1.179 10-Nov-2014  maxv branches: 1.179.2;
Do not uselessly include <sys/malloc.h>.
 1.178 25-Oct-2014  christos Avoid stack overflow when SACK and TCP_SIGNATURE are both present. Thanks
to Jonathan Looney for pointing this out.
 1.177 21-Oct-2014  hikaru Fix wrong condition checking TSO capability.
ipsec_used is not necessary condition.
IPsec outbound policy will not be checked when ipsec_used is false.
 1.176 30-May-2014  christos branches: 1.176.2;
Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.175 05-Jun-2013  christos branches: 1.175.2; 1.175.6;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.174 22-Mar-2012  drochner branches: 1.174.2;
remove KAME IPSEC, replaced by FAST_IPSEC
 1.173 31-Dec-2011  christos branches: 1.173.2; 1.173.6; 1.173.8;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.172 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.171 14-Apr-2011  yamt branches: 1.171.4; 1.171.8;
simplify a compile-time assertion
 1.170 21-Mar-2011  matt Clean up setting ECN bit in TOS. Fixes PR 44742
 1.169 26-Jan-2010  pooka branches: 1.169.4; 1.169.6;
tcp sockbuf autoscaling was initially added turned off because it
was experimental. People (including myself) have been running with
it turned on for eons now, so flip the default to enabled.
 1.168 18-Mar-2009  cegger bzero -> memset
 1.167 28-Apr-2008  martin branches: 1.167.8; 1.167.10; 1.167.14; 1.167.16; 1.167.20;
Remove clause 3 and 4 from TNF licenses
 1.166 12-Apr-2008  thorpej branches: 1.166.2; 1.166.4;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.165 08-Apr-2008  thorpej Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.
 1.164 14-Jan-2008  dyoung branches: 1.164.6;
Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().
 1.163 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.162 02-Sep-2007  dyoung branches: 1.162.6; 1.162.8; 1.162.12;
m_copy() was deprecated, apparently, long ago. m_copy(...) ->
m_copym(..., M_DONTWAIT).
 1.161 02-Aug-2007  yamt branches: 1.161.2; 1.161.4; 1.161.6;
make rfbuf_ts a tcp timestamp so that calculations in tcp_input make sense.
 1.160 02-Aug-2007  rmind TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.159 18-May-2007  riz branches: 1.159.2;
Fix compilation in the TCP_SIGNATURE case:

- don't use void * for pointer arithmetic
- don't try to modify const parameters

A kernel with 'options TCP_SIGNATURE' works as well as it ever did, now.
(ie, clunky, but passable)
 1.158 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.157 04-Mar-2007  christos branches: 1.157.2; 1.157.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.156 22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.155 21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.154 10-Feb-2007  degroote branches: 1.154.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.153 25-Nov-2006  yamt branches: 1.153.2; 1.153.4;
move tso-by-software code to their own files. no functional changes.
 1.152 23-Nov-2006  martin Make it compile on IPv4-only kernels
 1.151 23-Nov-2006  yamt implement ipv6 TSO.
partly from Matthias Scheler. tested by him.
 1.150 17-Oct-2006  yamt tcp_output: as a comment in tcp_sack_newack says, actually send
one or two segments on partial acks. even if sack_bytes_rxmt==0,
if we are in fast recovory with sack, snd_cwnd has somewhat special
meaning here. PR/34749.
 1.149 09-Oct-2006  rpaulo Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
 1.148 08-Oct-2006  yamt tcp_output: don't make TSO duplicate CWR/ECE.
 1.147 08-Oct-2006  yamt tcp_output: don't try to send SACK option larger than txsegsize.
fix a panic like "panic: m_copydata: off 0, len -7".
 1.146 07-Oct-2006  yamt tcp_output: remove duplicated code and tweak indent. no functional changes.
 1.145 01-Oct-2006  dbj back out revision 1.144 calculating txsegsizep since it unmasks
other bugs. See PR kern/34674
 1.144 28-Sep-2006  dbj consider sb_lowat when limiting the transmit length to keep acks on the wire
 1.143 05-Sep-2006  rpaulo branches: 1.143.2; 1.143.4;
Import of TCP ECN algorithm for congestion control.
Both available for IPv4 and IPv6.
Basic implementation test results are available at
http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.

Work sponsored by the Google Summer of Code project 2006.
Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their
help, comments and support during the project.
 1.142 25-Mar-2006  seanb Slight simplification of hdr len calculation in tcp_segsize().
No functional change.
 1.141 24-Dec-2005  perry branches: 1.141.4; 1.141.6; 1.141.8; 1.141.10; 1.141.12;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.140 11-Dec-2005  christos merge ktrace-lwp.
 1.139 10-Aug-2005  yamt wrap INET-only code by #if defined(INET).
 1.138 10-Aug-2005  yamt ipv6 tx checksum offloading. reviewed by Jason Thorpe.
 1.137 19-Jul-2005  christos Implement PMTU checks from:

http://www.gont.com.ar/drafts/icmp-attacks-against-tcp.html

1. Don't act on ICMP-need-frag immediately if adhoc checks on the
advertised MTU fail. The MTU update is delayed until a TCP retransmit
happens.
2. Ignore ICMP Source Quench messages meant for TCP connections.

From OpenBSD.
 1.136 28-Jun-2005  drochner branches: 1.136.2;
typo in comment
 1.135 29-May-2005  christos - add const
- remove bogus casts
- avoid nested variables
 1.134 08-May-2005  yamt tcp_output: account FIN when building sack option.
 1.133 08-May-2005  yamt tcp_output: don't try to send more data than we have. PR/30160.
 1.132 08-May-2005  yamt tcp_output: clear TH_FIN where appropriate. related to PR/30160.
 1.131 18-Apr-2005  yamt add a function to handle M_CSUM_TSOv4 by software.
 1.130 18-Apr-2005  yamt fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.
 1.129 29-Mar-2005  yamt tcp_output: lock reass queue when building sack.
 1.128 16-Mar-2005  yamt branches: 1.128.2;
simplify data receiver side sack processing.
- introduce t_segqlen, the number of segments in segq/timeq.
the name is from freebsd.
- rather than maintaining a copy of sack blocks (rcv_sack_block[]),
build it directly from the segment list when needed.
 1.127 16-Mar-2005  yamt - use full sized segments unless we actually have SACKs to send.
- avoid TSO duplicate D-SACK.
- send SACKs regardless of TF_ACKNOW.
- don't clear rcv_sack_num when transmitting.

discussed on tech-net@.
 1.126 12-Mar-2005  yamt don't try to use TSO to transmit a single segment.
- there's no benefit.
- rtl8169 seems to be stuck with it.
 1.125 09-Mar-2005  matt For AF_INET, always set m->m_pkthdr.csum_data. Don't or TSOv4, just set it.
 1.124 07-Mar-2005  yamt tcp_sack_option: the max number of sack blocks in a packet is 4, not 3.
 1.123 06-Mar-2005  thorpej Add a /*CONSTCOND*/ to last.
 1.122 06-Mar-2005  matt Fix typo. Opposite of >= is <, not ==.
 1.121 06-Mar-2005  matt Replace some gotos with a do while (0) and breaks. No functional change.
 1.120 06-Mar-2005  matt Add IPv4/TCP hooks for TCP Segment Offload on transmit.
 1.119 02-Mar-2005  mycroft Copyright maintenance.
 1.118 28-Feb-2005  jonathan Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over. Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting both
Kentaro's wired-Ethernet NIC and in my two (different) WiFi NICs.
 1.117 26-Feb-2005  perry nuke trailing whitespace
 1.116 03-Feb-2005  perry ANSIfy function declarations
 1.115 15-Dec-2004  thorpej branches: 1.115.2; 1.115.4;
Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.114 20-May-2004  jonathan With FAST_IPSEC, include <netipsec/key.h>, as Itojun's recent changes
now require KEY_FREESAV() to be in scope.
 1.113 18-May-2004  itojun fix MD5 signature support to actually validate inbound signature, and
drop packet if fails.
 1.112 08-May-2004  chs work around an LP64 problem where we report an excessively large window
due to incorrect mixing of types.
 1.111 26-Apr-2004  itojun make TCP MD5 signature work with KAME IPSEC (#define IPSEC).

support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream
for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the
right thing).

XXX
current TCP MD5 signature code has giant flaw:
it does not validate signature on input (can't believe it! what is the point?)
 1.110 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.109 30-Mar-2004  christos Make sure we disarm the persist timer before we arm the rexmit
timer, otherwise there is a tiny window where both timers are
active, and this is not correct according to the comments in the
code. I believe that this is the cause of the to_ticks <= 0 assertion
failure in callout_schedule() that I've been getting.
 1.108 03-Mar-2004  thorpej branches: 1.108.2;
Use IPSEC_PCB_SKIP_IPSEC() to short-circuit calls to ipsec{4,6}_hdrsiz_tcp().
 1.107 04-Feb-2004  itojun deal with IPv6 path MTU < 1280 (RFC2460 section 5 last paragraph).
check if there really is room for TCP data.
 1.106 12-Nov-2003  ragge Remove the FAST_MBSEARCH ifdef, send packet prediction is now default.
 1.105 24-Oct-2003  ragge Fix the bug in the tcp transmit prediction code.
During testing the prediction counters show a hit-rate on about 85% for
packets sent on a local LAN, and better than 99% for intercontinental
high-speed bulk traffic (!).
 1.104 24-Oct-2003  enami Make this file compile again when TCP_OUTPUT_COUNTERS defined.
 1.103 23-Oct-2003  thorpej Oops, FAST_MBSEARCH counters were swapped; fix it. Pointed out by yamt@.
 1.102 21-Oct-2003  thorpej Add event counters that measure FAST_MBSEARCH.
 1.101 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.100 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.99 22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.98 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.97 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.96 02-Jul-2003  ragge Make the fast-search stuff an option. There are still reports on
problem with it.
 1.95 02-Jul-2003  ragge Fix previous bug. Thanks to Enami for spotting the (obvious) error, and
to other people with much help with bug reports etc.
While fixing, change some of the code I added last time to make it
cleaner and simpler.
 1.94 30-Jun-2003  ragge branches: 1.94.2;
Disable the code I checked in yesterday; reports that samba (!) are crashing
machines with it. Will do some more tests.
 1.93 29-Jun-2003  fvdl Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.92 29-Jun-2003  ragge Add code to remember where in the send queue of mbufs the last packet was
sent from. This change avoid a linear search through all mbufs when using
large TCP windows, and therefore permit high-speed connections on long
distances.

Tested on a 1 Gigabit connection between Lule� and San Francisco, a distance
of about 15000km. With TCP windows of just over 20 Mbytes it could keep up
with 950Mbit/s.

After discussions with Matt Thomas and Jason Thorpe.
 1.91 17-May-2003  itojun no need for ip_v recovery in output path too
(tcp_template includes ip_v setting)
 1.90 01-Mar-2003  thorpej Allow TCP connections to hosts on a local network to use a larger
slow start initial window. Default this larger initial window to
4 packets, allowing it to be adjusted with net.inet.tcp.init_win_local.
 1.89 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.88 24-Nov-2002  scw Fix a genuine uninitialised variable warning.
 1.87 02-Nov-2002  itojun cleanup ipsec.h dependency. commented by perry, sync w/kame
 1.86 13-Sep-2002  mycroft In the txsegsize bounding code, it is not necessary to adjust for the options
length.
 1.85 20-Aug-2002  thorpej Never send more than half a socket buffer of data. This insures that
we can always keep 2 packets on the wire, no matter what SO_SNDBUF is,
and therefore ACKs will never be delayed unless we run out of data to
transmit. The problem is quite easy to tickle when the MTU of the
outgoing interface is larger than the socket buffer size (e.g. loopback).

Fix from Charles Hannum.
 1.84 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.83 13-Jun-2002  thorpej Disable TCP Congestion Window Monitoring by default; there are
performance problems in the face of tinygrams.
 1.82 09-Jun-2002  itojun whitespace
 1.81 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.80 26-May-2002  itojun path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
 1.79 27-Apr-2002  thorpej branches: 1.79.2; 1.79.4;
* Instrument tcp_build_datapkt().
* Remove the code that allocates a cluster if the packet would
fit in one; it totally defeats doing references to M_EXT mbufs
in the socket buffer. This drastically reduces the number of
data copies in the tcp_output() path for applications which use
large writes. Kudos to Matt Thomas for pointing me in the right
direction.
 1.78 01-Mar-2002  thorpej In tcp_segsize(), move a label so that option length is considered
when using the default TCP MSS as well. From Matt Thomas.
 1.77 24-Jan-2002  itojun place NRL copyright notice itself, not a reference to it.
 1.76 03-Dec-2001  jmcneill Fix TCP segment size computation. From Rick Byersm, PR kern/14799.
 1.75 13-Nov-2001  lukem add RCSIDs
 1.74 10-Sep-2001  thorpej Use callouts for TCP timers, rather than traversing the list of
all open TCP connections in tcp_slowtimo() (which is called 2x
per second). It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
 1.73 10-Sep-2001  thorpej Change the way receive idle time and round trip time are measured.
Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now
take a timstamp (via tcp_now) and use subtraction to compute the
delta when we actually need it (using unsigned arithmetic so that
tcp_now wrapping is handled correctly).

Based on similar changes in FreeBSD.
 1.72 10-Sep-2001  thorpej Enable Congestion Window Monitoring by default.
 1.71 10-Sep-2001  thorpej Use a callout for the delayed ACK timer, and delete tcp_fasttimo().
Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
 1.70 31-Jul-2001  thorpej branches: 1.70.2;
Carve off the code that builds a TCP data packet into its own
function, and inline it, except when profiling... so we can
profile it.
 1.69 31-Jul-2001  thorpej Count the number of times we "self-quench" (ip_output() returns
ENOBUFS), and don't inline tcp_segsize() if profiling.
 1.68 26-Jul-2001  thorpej Slight cosmetic change.
 1.67 08-Jul-2001  abs branches: 1.67.2;
Rename TCPDEBUG to TCP_DEBUG, defopt TCP_DEBUG and TCP_NDEBUG, and
make all usage of tcp_trace dependent on TCP_DEBUG - resulting in
a 31K saving on an INET enabled i386 kernel.
 1.66 02-Jun-2001  thorpej Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.65 03-Apr-2001  itojun check ip_mtudisc only for TCP over IPv4.
PMTUD is mandatory for TCP over IPv6 (if packets > 1280).
 1.64 20-Mar-2001  thorpej Two changes, designed to make us even more resilient against TCP
ISS attacks (which we already fend off quite well).

1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic
hash method of generating TCP ISS values. Note, this code is experimental
and disabled by default (experimental enough that I don't export the
variable via sysctl yet, either). There are a couple of issues I'd
like to discuss with Steve, so this code should only be used by people
who really know what they're doing.

2. Per a recent thread on Bugtraq, it's possible to determine a system's
uptime by snooping the RFC1323 TCP timestamp options sent by a host; in
4.4BSD, timestamps are created by incrementing the tcp_now variable
at 2 Hz; there's even a company out there that uses this to determine
web server uptime. According to Newsham's paper "The Problem With
Random Increments", while NetBSD's TCP ISS generation method is much
better than the "random increment" method used by FreeBSD and OpenBSD,
it is still theoretically possible to mount an attack against NetBSD's
method if the attacker knows how many times the tcp_iss_seq variable
has been incremented. By not leaking uptime information, we can make
that much harder to determine. So, we avoid the leak by giving each
TCP connection a timebase of 0.
 1.63 24-Jan-2001  itojun branches: 1.63.2;
- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.62 06-Nov-2000  itojun fix IPv4 TTL selection with AF_INET6 API. sync with kame. From: jdc
 1.61 19-Oct-2000  itojun remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
 1.60 17-Oct-2000  itojun be more friendly with INET-less build.
XXX we need to do more to do a working INET-less build
 1.59 17-Oct-2000  thorpej Add an IP_MTUDISC flag to the flags that can be passed to
ip_output(). This flag, if set, causes ip_output() to set
DF in the IP header if the MTU in the route is not locked.

This allows a bunch of redundant code, which I was never
really all that happy about adding in the first place, to
be eliminated.

Inspired by a similar change made by provos@openbsd.org when
he integrated NetBSD's Path MTU Discovery code into OpenBSD.
 1.58 28-Jul-2000  itojun forgot to call tcp6_quench(). sync with kame.
 1.57 30-Jun-2000  itojun remove old mbuf assumption (ip header and tcp header are on the same mbuf).
this is for m_pulldown use. (sync with kame)
 1.56 30-Mar-2000  augustss branches: 1.56.4;
Remove register declarations.
 1.55 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.54 09-Feb-2000  itojun optimize mbuf allocation for ip/tcp/tcpopt part.
 1.53 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.52 23-Sep-1999  itojun branches: 1.52.2; 1.52.8;
cleanup and correct TCP MSS consideration with IPsec headers.

MSS advertisement must always be:
max(if mtu) - ip hdr siz - tcp hdr siz
We violated this in the previous code so it was fixed.

tcp_mss_to_advertise() now takes af (af on wire) as its argument,
to compute right ip hdr siz.

tcp_segsize() will take care of IPsec header size.
One thing I'm not really sure is how to handle IPsec header size in
*rxsegsizep (inbound segment size estimation).
The current code subtracts possible *outbound* IPsec size from *rxsegsizep,
hoping that the peer is using the same IPsec policy as me.
It may not be applicable, could TCP gulu please comment...
 1.51 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.50 02-Jul-1999  fvdl Fix for -Wunitialized warnings broke compiles without INET6, refix.
 1.49 02-Jul-1999  itojun avoid "variable not initialized" warnings on some of the platforms.
 1.48 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.47 20-Jan-1999  thorpej branches: 1.47.4; 1.47.6;
Fix a problem pointed out by Charles Hannum; DF wasn't being set in
SYN,ACK packets during Path MTU Discovery. Fix tcp_respond() to do the
appropriate route lookup and set DF as appropriate.

Also, fixup similar code in tcp_output() to relookup the route if it
is down.
 1.46 16-Dec-1998  thorpej Delay sending if SS_MORETOCOME is set in so_state. This avoids the case
where the user issued a write with a length greater than MLEN but less
than MINCLSIZE, thus causing two mbufs to be used. The loop in sosend()
would then call PRU_SEND twice, causing TCP to transmit 2 packets when
it could have transmitted one.

Suggested by Justin Walker <justin@apple.com> on the freebsd-net
mailing list.
 1.45 06-Oct-1998  matt Add a sysctl for newreno (default to off).
 1.44 04-Oct-1998  matt Adapt the NEWRENO changes from the UCSB diffs of BSDI 3.0's TCP
to NetBSD. Ignore the SACK & FACK stuff for now.
 1.43 21-Jul-1998  mycroft Implement a better fix for the `gratuitous FIN' problem, as
mentioned on tcp-impl but with a bit more commentary.
 1.42 17-Jul-1998  thorpej Add a comment wrt. a current issue w/ CWM.
 1.41 17-Jul-1998  thorpej Comment where the Restart Window is computed, and in the non-CWM case,
make sure it never _increases_ cwnd.
 1.40 07-Jul-1998  sommerfe Delete bogus (void) cast of m_freem (which is already a void function..)
 1.39 11-May-1998  thorpej Nuke TUBA per my note to tech-net; there's no reason to keep it around.
 1.38 06-May-1998  thorpej Use macros from tcp_timer.h to manipulate TCP timers, so that their
implementation can be changed easily.
 1.37 02-May-1998  thorpej Correct a comment related to Congestion Window Monitoring.
 1.36 30-Apr-1998  thorpej In the CWM code, don't use the Floyd initial window computation as
the burst size allowed, but rather a fixed number of packets, as
described in the Internet Draft. Default allowed burst is 4 packets,
per the Draft.

Make the use of CWM and the allowed burst size tunable via sysctl.
 1.35 29-Apr-1998  kml Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.
 1.34 13-Apr-1998  kml Fix to ensure that the correct MSS is advertised for loopback
TCP connections by using the MTU of the interface. Also added
a knob, mss_ifmtu, to force all connections to use the MTU of
the interface to calculate the advertised MSS.
 1.33 01-Apr-1998  thorpej Implement Congestion Window Monitoring as described in the TCPIMPL
meeting of IETF #41 by Amy Hughes <ahughes@isi.edu>, and in an upcoming
internet draft from Hughes, Touch, and Heidemann.

CWM eliminates line-rate bursts after idle periods by counting pending
(unacknowledged) packets and limiting the congestion window to the
initial congestion window plus the pending packet count. This has the
effect of allowing us to use the window as long as we continue to transmit,
but as soon as we stop transmitting, we go back to a slow-start (also known
as `use it or lose it').

This is not enabled by default. You can enable this behavior by patching
the "tcp_cwm" global (set it to non-zero) or by building a kernel with the
TCP_CWM option.
 1.32 31-Mar-1998  thorpej Fix a potential-congestion case in the larger initial congestion window
code, as clarified in the TCPIMPL WG meeting at IETF #41: If the SYN
(active open) or SYN,ACK (passive open) was retransmitted, the initial
congestion window for the first slow start of that connection must be
one segment.
 1.31 24-Mar-1998  kml Ensure that we take the IP option length into account when we calculate
the effective maximum send size for TCP. ip_optlen() and tcp_optlen()
should probably be inlined for efficiency.
 1.30 19-Mar-1998  kml Fix a retransmission bug introduced by the Brakmo and Peterson
RTO estimation changes. Under some circumstances it would return a value
of 0, while the old Van Jacobson RTO code would return a minimum of 3.
This would result in 12 retransmissions, each 1 second apart.
This takes care of those instances, and ensures that t_rttmin is
used everywhere as a lower bound.
 1.29 17-Mar-1998  kml Ensure that the TCP segment size reflects the size of TCP options
in the packet. This fixes a bug that was resulting in extra packets
in retransmissions (the second packet would be 12 bytes long,
reflecting the RFC1323 timestamp option size).
 1.28 19-Feb-1998  thorpej Update copyright (sigh, should have done this long ago).
 1.27 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.26 31-Dec-1997  thorpej Implement a queue for delayed ACK processing. This queue is used in
tcp_fasttimo() in lieu of scanning all open TCP connections.
 1.25 17-Dec-1997  thorpej From 4.4BSD-Lite2:
- If we fail to allocate mbufs for the outgoing segment, free the header
and abort.

From Stevens:
- Ensure the persist timer is running if the send window reaches zero.
Part of the fix for kern/2335 (pete@daemon.net).
 1.24 11-Dec-1997  thorpej Implement an infrastructure to allow larger initial congestion windows.
The sysctl'able variable "tcp_init_win", when set to 0, selects an
auto-tuning algorithm for selecting the initial window, based on transmit
segment size, per discussion in the IETF tcpimpl working group.

Default initial window is still 1 segment, but will soon become 2 segments,
per discussion in tcpimpl.
 1.23 11-Dec-1997  thorpej Count delayed ACKs after they have been sucessfully transmitted.
 1.22 20-Nov-1997  thorpej Add missing (implied) int to a variable declaration.
 1.21 08-Nov-1997  kml TCP MSS fixes to provide cleaner slow-start and recovery.
 1.20 18-Oct-1997  kml branches: 1.20.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc
 1.19 17-Oct-1997  kml Path MTU Discovery support. This is turned off by default.
Use sysctl -w net.inet.icmp.mtudisc=1 to turn on.
Still to come: path removal after some period, black hole detection
 1.18 08-Oct-1997  thorpej Fix an oversight in my previous MSS-related changes:

Basically, in silly window avoidance, don't use the raw MSS we advertised
to the peer. What we really want here is the _expected_ size of received
segments, so we need to account for the path MTU (eventually; right now,
the interface MTU for "local" addresses and loopback or tcp_mssdflt for
non-local addresses). Without this, silly window avoidance would never
kick in if we advertised a very large (e.g. ~64k) MSS to the peer.
 1.17 22-Sep-1997  thorpej Fix several annoyances related to MSS handling in BSD TCP:
- Don't overload t_maxseg. Previous behavior was to set it to the min
of the peer's advertised MSS, our advertised MSS, and tcp_mssdflt
(for non-local networks). This breaks PMTU discovery running on
either host. Instead, remember the MSS we advertise, and use it
as appropriate (in silly window avoidance).
- Per last bullet, split tcp_mss() into several functions for handling
MSS (ours and peer's), and performing various tasks when a connection
becomes ESTABLISHED.
- Introduce a new function, tcp_segsize(), which computes the max size
for every segment transmitted in tcp_output(). This will eventually
be used to hook in PMTU discovery.
 1.16 03-Jun-1997  kml branches: 1.16.4;
Fix urgent pointer overflow problems when used with large windows
 1.15 10-Dec-1996  mycroft Fix RTT scaling problems introduced with Brakmo and Peterson changes.
 1.14 13-Feb-1996  christos branches: 1.14.4;
netinet prototypes
 1.13 13-Apr-1995  cgd oops; missed the chance to fix a cast, that then becamse a compiler warning.
 1.12 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.11 23-Jan-1995  mycroft Fix a condition where we sometimes sent a FIN too early. Also, a small
optimization.
 1.10 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8 12-Apr-1994  mycroft Acks with no data should have the highest sequence number sent.
 1.7 10-Jan-1994  mycroft Should compile now with or without `options MULTICAST'.
 1.6 08-Jan-1994  mycroft Prototypes.
 1.5 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.4 18-Dec-1993  mycroft Canonicalize all #includes.
 1.3 22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.14.4.1 10-Dec-1996  mycroft From trunk:
Fix RTT scaling problems introduced with Brakmo and Peterson changes.
 1.16.4.2 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.16.4.1 29-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.20.2.6 09-May-1998  mycroft Pull up patch from kml.
 1.20.2.5 05-May-1998  mycroft Pull up 1.29, per request of kml.
 1.20.2.4 05-May-1998  mycroft Pull up 1.30, per request of kml.
 1.20.2.3 29-Jan-1998  mellon Pull up 1.24-1.27 (thorpej)
 1.20.2.2 21-Nov-1997  thorpej Sync w/ trunk: add a missing (previously implied) int.
 1.20.2.1 08-Nov-1997  thorpej Pull up from trunk: TCP MSS fixes to provide cleaner slow-start and recovery.
(kml)
 1.47.6.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.47.6.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.47.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.47.4.2 02-Aug-1999  thorpej Update from trunk.
 1.47.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.52.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.52.2.5 21-Apr-2001  bouyer Sync with HEAD
 1.52.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.52.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.52.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.52.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.56.4.5 24-Jan-2002  he Pull up revision 1.77 (requested by itojun):
Clean up the NRL copyright.
 1.56.4.4 06-Apr-2001  he Pull up revision 1.63 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.56.4.3 10-Nov-2000  tv Pullup 1.62 [itojun]:
fix IPv4 TTL selection with AF_INET6 API. sync with kame. From: jdc
 1.56.4.2 15-Aug-2000  itojun pullup 1.57 -> 1.58 (approved by releng-1-5)

> forgot to call tcp6_quench(). sync with kame.
 1.56.4.1 23-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

remove old mbuf assumption (ip header and tcp header are on the same mbuf).
this is for m_pulldown use. (sync with kame)

1.108 -> 1.109 syssrc/sys/netinet/tcp_input.c
1.56 -> 1.57 syssrc/sys/netinet/tcp_output.c
1.91 -> 1.92 syssrc/sys/netinet/tcp_subr.c
 1.63.2.14 11-Dec-2002  thorpej Sync with HEAD.
 1.63.2.13 11-Nov-2002  nathanw Catch up to -current
 1.63.2.12 17-Sep-2002  nathanw Catch up to -current.
 1.63.2.11 27-Aug-2002  nathanw Catch up to -current.
 1.63.2.10 20-Jun-2002  nathanw Catch up to -current.
 1.63.2.9 04-May-2002  thorpej Update from trunk.
 1.63.2.8 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.63.2.7 28-Feb-2002  nathanw Catch up to -current.
 1.63.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.63.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.63.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.63.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.63.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.63.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.67.2.8 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.67.2.7 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.67.2.6 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.67.2.5 16-Mar-2002  jdolecek Catch up with -current.
 1.67.2.4 11-Feb-2002  jdolecek Sync w/ -current.
 1.67.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.67.2.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.67.2.1 03-Aug-2001  lukem update to -current
 1.70.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.79.4.5 07-Feb-2004  jmc Pullup rev 1.107 (requested by itojun in ticket #1605)

Deal with IPv6 path MTU < 1280 (RFC2460 section 5 last paragraph)
Check if there really is room for TCP data.
 1.79.4.4 05-Sep-2003  tron Pull up revision 1.80 (requested by tls in ticket #1445):
path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
 1.79.4.3 30-Nov-2002  he Pull up revision 1.86 (requested by thorpej in ticket #795):
In the txsegsize bounding code, it is not necessary to adjust
for the options length.
 1.79.4.2 21-Nov-2002  he Pull up revision 1.85 (requested by thorpej in ticket #707):
Never send more than half a socket buffer of data in a
segment. This ensures that we can always keep 2 packets
on the wire, and we will therefore not cause any delayed
ACKs. Otherwise, this causes performance problems when
using large-MTU interfaces, such as the loopback interface.
 1.79.4.1 14-Jun-2002  lukem Pull up revision 1.83 (requested by thorpej in ticket #267):
Disable TCP Congestion Window Monitoring by default; there are
performance problems in the face of tinygrams.
 1.79.2.3 29-Aug-2002  gehenna catch up with -current.
 1.79.2.2 20-Jun-2002  gehenna catch up with -current.
 1.79.2.1 30-May-2002  gehenna Catch up with -current.
 1.94.2.9 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.94.2.8 01-Apr-2005  skrll Sync with HEAD.
 1.94.2.7 08-Mar-2005  skrll Sync with HEAD.
 1.94.2.6 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.94.2.5 04-Feb-2005  skrll Sync with HEAD.
 1.94.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.94.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.94.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.94.2.1 03-Aug-2004  skrll Sync with HEAD
 1.108.2.1 11-May-2004  tron Pull up revision 1.112 (requested by chs in ticket #292):
work around an LP64 problem where we report an excessively large window
due to incorrect mixing of types.
 1.115.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.115.4.1 12-Feb-2005  yamt sync with head.
 1.115.2.1 29-Apr-2005  kent sync with -current
 1.128.2.5 11-May-2005  tron Pull up revision 1.134 (requested by yamt in ticket #294):
tcp_output: account FIN when building sack option.
 1.128.2.4 11-May-2005  tron Pull up revision 1.133 (requested by yamt in ticket #293):
tcp_output: don't try to send more data than we have. PR/30160.
 1.128.2.3 11-May-2005  tron Pull up revision 1.132 (requested by yamt in ticket #293):
tcp_output: clear TH_FIN where appropriate. related to PR/30160.
 1.128.2.2 06-May-2005  tron Pull up revision 1.130 (requested by yamt in ticket #251):
fix problems related to loopback interface checksum omission. PR/29971.
- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)
ok'ed by Jason Thorpe.
 1.128.2.1 04-Apr-2005  tron Pull up revision 1.129 (requested by yamt in ticket #89):
tcp_output: lock reass queue when building sack.
 1.136.2.5 21-Jan-2008  yamt sync with head
 1.136.2.4 03-Sep-2007  yamt sync with head.
 1.136.2.3 26-Feb-2007  yamt sync with head.
 1.136.2.2 30-Dec-2006  yamt sync with head.
 1.136.2.1 21-Jun-2006  yamt sync with head.
 1.141.12.1 28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.141.10.1 19-Apr-2006  elad sync with head.
 1.141.8.2 14-Sep-2006  yamt sync with head.
 1.141.8.1 01-Apr-2006  yamt sync with head.
 1.141.6.1 22-Apr-2006  simonb Sync with head.
 1.141.4.2 09-Sep-2006  rpaulo sync with head
 1.141.4.1 05-Feb-2006  rpaulo <netinet6/in6_pcb.h> went away. Bye!
 1.143.4.2 10-Dec-2006  yamt sync with head.
 1.143.4.1 22-Oct-2006  yamt sync with head
 1.143.2.2 12-Jan-2007  ad Sync with head.
 1.143.2.1 18-Nov-2006  ad Sync with head.
 1.153.4.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.153.2.2 03-Apr-2011  riz Pull up following revision(s) (requested by spz in ticket #1424):
sys/netinet/tcp_output.c: revision 1.170
Clean up setting ECN bit in TOS. Fixes PR 44742
 1.153.2.1 24-May-2007  pavel branches: 1.153.2.1.4;
Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.153.2.1.4.1 03-Apr-2011  riz Pull up following revision(s) (requested by spz in ticket #1424):
sys/netinet/tcp_output.c: revision 1.170
Clean up setting ECN bit in TOS. Fixes PR 44742
 1.154.2.3 07-May-2007  yamt sync with head.
 1.154.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.154.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.157.4.1 11-Jul-2007  mjf Sync with head.
 1.157.2.3 09-Oct-2007  ad Sync with head.
 1.157.2.2 20-Aug-2007  ad Sync with HEAD.
 1.157.2.1 08-Jun-2007  ad Sync with head.
 1.159.2.2 03-Sep-2007  skrll Sync with HEAD.
 1.159.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.161.6.2 02-Aug-2007  yamt make rfbuf_ts a tcp timestamp so that calculations in tcp_input make sense.
 1.161.6.1 02-Aug-2007  yamt file tcp_output.c was added on branch matt-mips64 on 2007-08-02 13:12:36 +0000
 1.161.4.3 23-Mar-2008  matt sync with HEAD
 1.161.4.2 09-Jan-2008  matt sync with HEAD
 1.161.4.1 06-Nov-2007  matt sync with HEAD
 1.161.2.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.162.12.2 19-Jan-2008  bouyer Sync with HEAD
 1.162.12.1 02-Jan-2008  bouyer Sync with HEAD
 1.162.8.1 26-Dec-2007  ad Sync with head.
 1.162.6.1 18-Feb-2008  mjf Sync with HEAD.
 1.164.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.166.4.3 11-Mar-2010  yamt sync with head
 1.166.4.2 04-May-2009  yamt sync with head.
 1.166.4.1 16-May-2008  yamt sync with head.
 1.166.2.1 18-May-2008  yamt sync with head.
 1.167.20.2 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1973):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.167.20.1 29-Mar-2011  riz Pull up following revision(s) (requested by spz in ticket #1586):
sys/netinet/tcp_output.c: revision 1.170
Clean up setting ECN bit in TOS. Fixes PR 44742
 1.167.16.1 29-Mar-2011  riz Pull up following revision(s) (requested by spz in ticket #1586):
sys/netinet/tcp_output.c: revision 1.170
Clean up setting ECN bit in TOS. Fixes PR 44742
 1.167.14.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.167.10.2 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1973):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.167.10.1 29-Mar-2011  riz branches: 1.167.10.1.2;
Pull up following revision(s) (requested by spz in ticket #1586):
sys/netinet/tcp_output.c: revision 1.170
Clean up setting ECN bit in TOS. Fixes PR 44742
 1.167.10.1.2.1 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1973):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.167.8.1 28-Apr-2009  skrll Sync with HEAD.
 1.169.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.169.4.1 21-Apr-2011  rmind sync with head
 1.171.8.2 05-Apr-2012  mrg sync to latest -current.
 1.171.8.1 18-Feb-2012  mrg merge to -current.
 1.171.4.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.171.4.1 17-Apr-2012  yamt sync with head
 1.173.8.2 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1315):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.173.8.1 03-Nov-2014  msaitoh Pull up following revision(s) (requested by christos in ticket #1174):
sys/netinet/tcp_output.c: revision 1.178
Avoid stack overflow when SACK and TCP_SIGNATURE are both present. Thanks
to Jonathan Looney for pointing this out.
 1.173.6.2 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1315):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.173.6.1 03-Nov-2014  msaitoh Pull up following revision(s) (requested by christos in ticket #1174):
sys/netinet/tcp_output.c: revision 1.178
Avoid stack overflow when SACK and TCP_SIGNATURE are both present. Thanks
to Jonathan Looney for pointing this out.
 1.173.2.2 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #1315):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.173.2.1 03-Nov-2014  msaitoh Pull up following revision(s) (requested by christos in ticket #1174):
sys/netinet/tcp_output.c: revision 1.178
Avoid stack overflow when SACK and TCP_SIGNATURE are both present. Thanks
to Jonathan Looney for pointing this out.
 1.174.2.3 03-Dec-2017  jdolecek update from HEAD
 1.174.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.174.2.1 23-Jun-2013  tls resync from head
 1.175.6.1 10-Aug-2014  tls Rebase.
 1.175.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.176.2.5 24-Jul-2015  martin Pull up following revision(s) (requested by matt in ticket #886):
sys/netinet/tcp_output.c: revision 1.184
sys/netinet/tcp_input.c: revision 1.343

If we are sending a window probe and there's unacked data in the
socket, make sure at least the persist timer is running.
Make sure that snd_win doesn't go negative.
 1.176.2.4 21-Feb-2015  martin Pull up following revision(s) (requested by he in ticket #530):
sys/netinet/tcp_output.c: revision 1.180
sys/netinet/tcp_input.c: revision 1.336
sys/netinet/tcp_usrreq.c: revision 1.203
share/man/man4/tcp.4: revision 1.30
sys/netinet/tcp.h: revision 1.31
sys/netinet/tcp_subr.c: revision 1.258
sys/netinet/tcp_var.h: revision 1.176
sys/netinet/tcp_var.h: revision 1.177
sys/sys/param.h: bump revision

Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).

Change the new counter variables in struct tcpcb to uint32_t, as
per christos' comments.
 1.176.2.3 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.176.2.2 26-Oct-2014  martin Pull up following revision(s) (requested by christos in ticket #157):
sys/netinet/tcp_output.c: revision 1.178
Avoid stack overflow when SACK and TCP_SIGNATURE are both present. Thanks
to Jonathan Looney for pointing this out.
 1.176.2.1 24-Oct-2014  martin Pull up following revision(s) (requested by hikaru in ticket #154):
sys/netinet/tcp_output.c: revision 1.177
Fix wrong condition checking TSO capability.
ipsec_used is not necessary condition.
IPsec outbound policy will not be checked when ipsec_used is false.
 1.179.2.6 28-Aug-2017  skrll Sync with HEAD
 1.179.2.5 05-Feb-2017  skrll Sync with HEAD
 1.179.2.4 09-Jul-2016  skrll Sync with HEAD
 1.179.2.3 22-Sep-2015  skrll Sync with HEAD
 1.179.2.2 06-Jun-2015  skrll Sync with HEAD
 1.179.2.1 06-Apr-2015  skrll Sync with HEAD
 1.186.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.186.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.194.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.196.2.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.198.2.6 18-Jan-2019  pgoyette Synch with HEAD
 1.198.2.5 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.198.2.4 21-May-2018  pgoyette Sync with HEAD
 1.198.2.3 07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.198.2.2 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.198.2.1 15-Mar-2018  pgoyette Synch with HEAD
 1.208.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.208.2.1 10-Jun-2019  christos Sync with HEAD
 1.218.2.1 21-Sep-2023  martin Pull up following revision(s) (requested by bouyer in ticket #377):

sys/netinet/tcp_output.c: revision 1.219

Handle EHOSTDOWN the same way as EHOSTUNREACH and ENETDOWN for established
connections. Avoid premature end of tcp connection with "Host is down" error
in case of transient link-layer failure.

Discussed and patch proposed in
http://mail-index.netbsd.org/tech-net/2023/09/11/msg008610.html
and followups.
 1.220.2.1 02-Aug-2025  perseant Sync with HEAD
 1.6 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.5 17-Feb-2021  christos - pass the alignment instead of the mask (as Roy asked and to match the
other macro)
- use alignof to determine that alignment and CTASSERT what we expect
- remove unused macros
 1.4 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.3 28-Apr-2008  martin branches: 1.3.4; 1.3.102;
Remove clause 3 and 4 from TNF licenses
 1.2 23-Apr-2008  thorpej branches: 1.2.2;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.1 12-Apr-2008  thorpej branches: 1.1.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.1.2.1 18-May-2008  yamt sync with head.
 1.2.2.1 16-May-2008  yamt sync with head.
 1.3.102.1 03-Apr-2021  thorpej Sync with HEAD.
 1.3.4.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.4.1 28-Apr-2008  mjf file tcp_private.h was added on branch mjf-devfs2 on 2008-06-02 13:24:25 +0000
 1.36 18-May-2018  maxv IP6_EXTHDR_GET -> M_REGION_GET, no functional change.
 1.35 03-May-2018  maxv Remove now unused tcpip.h includes. Some were already unused before.
 1.34 29-Mar-2018  maxv Remove #ifdef INET. Same as tcp_input.c. Makes the code easier to
understand.

Also make tcp6_mtudisc() static in tcp_subr.c.
 1.33 13-Dec-2016  ozaki-r branches: 1.33.14;
Remove unnecessary inclusions of nd6.h
 1.32 24-Aug-2015  pooka branches: 1.32.2;
sprinkle _KERNEL_OPT
 1.31 31-Mar-2015  ozaki-r Remove unnecessary opt_ipsec.h inclusions
 1.30 10-Nov-2014  maxv branches: 1.30.2;
Do not uselessly include <sys/malloc.h>.
 1.29 12-Nov-2013  kefren branches: 1.29.4;
* implement TCP CUBIC congestion control algorithm
* move tcp_sack_newack bits inside reno and newreno_fast_retransmit_newack
* notify ECN peer about cwnd shrink in [new]reno_slow_retransmit

Based on the patch proposed on tech-net@ on Nov 7 with minor improvments:
* adapt wmax for no-fast convergence case
* correct cbrt calculation for big window sizes (>750KB)
 1.28 30-Jan-2012  matt branches: 1.28.6; 1.28.10;
Use proper ANSI prototypes for foo() -> foo(void)
Caught when compiling with -Wold-style-definition
 1.27 17-Jul-2011  joerg branches: 1.27.2; 1.27.6;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.26 14-Apr-2011  yamt - comments
- whitespace
 1.25 27-May-2009  pooka branches: 1.25.4; 1.25.6;
POOL_INIT -> pool_init
 1.24 28-Apr-2008  martin branches: 1.24.14;
Remove clause 3 and 4 from TNF licenses
 1.23 12-Mar-2007  ad branches: 1.23.34; 1.23.36; 1.23.38;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.22 21-Oct-2006  yamt branches: 1.22.4; 1.22.8;
add sack_dump(), a function to dump sack holes, if defined(DDB).
 1.21 21-Oct-2006  yamt - constify.
- make tcp_dooptions and tcpipqent_pool static.
 1.20 20-Oct-2006  reinoud Fix alignment problems causing regular panics in tpc_sack_option on
NetBSD/alpha and NetBSD/sparc. This fixes PR#34751.

The problem most likely started to show in gcc4 and is caused by the use of
a casting to an uint32_t pointer that is later copied from using memcpy.
Gcc detects the copying of 4 bytes from an uint32_t pointer and decides to
just replace it with an aligned copy causing the trap.

Fix provided by Izumi Tsutsui and ok'd by Martin.
 1.19 07-Oct-2006  yamt - make sackhole_pool static.
- unify duplicated hole allocation and accounting code.
(no functional changes.)
 1.18 07-Oct-2006  yamt revert tcp_sack.c rev.1.15 because it's unnecessary.
all callers of these functions are at splsoftnet already:
tcp_sack_option
tcp_input ok

tcp_del_sackholes
tcp_input ok

tcp_free_sackholes
tcp_close ok
tcp_timer_rexmt ok
tcp_timer_2msl ok
 1.17 07-Oct-2006  yamt tcp_sack_output: whitespace.
 1.16 07-Oct-2006  yamt tcp_del_sackholes: whitespace.
 1.15 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.14 11-Dec-2005  christos branches: 1.14.4; 1.14.20; 1.14.22;
merge ktrace-lwp.
 1.13 08-May-2005  yamt branches: 1.13.2;
tcp_sack_option: ignore sack beyond snd_max.
 1.12 05-Apr-2005  kurahone branches: 1.12.2;
Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.

Idea taken from FreeBSD.
 1.11 18-Mar-2005  kurahone branches: 1.11.2;
TCP/SACK changes from FreeBSD.

Ignore the SACK option if
* The packet is not an ACK.
* The ACK is outside of snd_una -> snd_max
 1.10 16-Mar-2005  yamt branches: 1.10.2;
simplify data receiver side sack processing.
- introduce t_segqlen, the number of segments in segq/timeq.
the name is from freebsd.
- rather than maintaining a copy of sack blocks (rcv_sack_block[]),
build it directly from the segment list when needed.
 1.9 16-Mar-2005  yamt - use full sized segments unless we actually have SACKs to send.
- avoid TSO duplicate D-SACK.
- send SACKs regardless of TF_ACKNOW.
- don't clear rcv_sack_num when transmitting.

discussed on tech-net@.
 1.8 08-Mar-2005  yamt tcp_sack_option: handle the case that the right-most sack'ed block is expanded.
a fix from Noritoshi Demizu (FreeBSD PR/78226) via Kentaro A. Kurahone.
 1.7 07-Mar-2005  yamt tcp_sack_option: fix the cases that some sack blocks go into a hole.
 1.6 07-Mar-2005  yamt tcp_sack_option: fix a typo(?), which can cause to ignore valid blocks.
 1.5 07-Mar-2005  yamt tcp_sack_option: the max number of sack blocks in a packet is 4, not 3.
 1.4 06-Mar-2005  yamt - unwrap short lines.
- remove unneeded parenthesis.
- whitespace.
 1.3 06-Mar-2005  yamt don't assume alignment of sack options.
 1.2 06-Mar-2005  yamt wrap long lines.
 1.1 28-Feb-2005  jonathan branches: 1.1.2;
Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over. Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting both
Kentaro's wired-Ethernet NIC and in my two (different) WiFi NICs.
 1.1.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.1.2.4 01-Apr-2005  skrll Sync with HEAD.
 1.1.2.3 08-Mar-2005  skrll Sync with HEAD.
 1.1.2.2 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.2.1 28-Feb-2005  skrll file tcp_sack.c was added on branch ktrace-lwp on 2005-03-04 16:53:29 +0000
 1.10.2.4 11-Nov-2006  bouyer Pull up following revision(s) (requested by reinoud in ticket #1561):
sys/netinet/tcp_sack.c: revision 1.20
Fix alignment problems causing regular panics in tpc_sack_option on
NetBSD/alpha and NetBSD/sparc. This fixes PR#34751.
The problem most likely started to show in gcc4 and is caused by the use of
a casting to an uint32_t pointer that is later copied from using memcpy.
Gcc detects the copying of 4 bytes from an uint32_t pointer and decides to
just replace it with an aligned copy causing the trap.
Fix provided by Izumi Tsutsui and ok'd by Martin.
 1.10.2.3 11-May-2005  tron branches: 1.10.2.3.2; 1.10.2.3.4;
Pull up revision 1.13 (requested by yamt in ticket #293):
tcp_sack_option: ignore sack beyond snd_max.
 1.10.2.2 06-May-2005  tron Pull up revision 1.12 (requested by kurahone in ticket #199):
Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.
Idea taken from FreeBSD.
 1.10.2.1 06-May-2005  tron Pull up revision 1.11 (requested by yamt in ticket #250):
TCP/SACK changes from FreeBSD.
Ignore the SACK option if
* The packet is not an ACK.
* The ACK is outside of snd_una -> snd_max
 1.10.2.3.4.1 11-Nov-2006  bouyer Pull up following revision(s) (requested by reinoud in ticket #1561):
sys/netinet/tcp_sack.c: revision 1.20
Fix alignment problems causing regular panics in tpc_sack_option on
NetBSD/alpha and NetBSD/sparc. This fixes PR#34751.
The problem most likely started to show in gcc4 and is caused by the use of
a casting to an uint32_t pointer that is later copied from using memcpy.
Gcc detects the copying of 4 bytes from an uint32_t pointer and decides to
just replace it with an aligned copy causing the trap.
Fix provided by Izumi Tsutsui and ok'd by Martin.
 1.10.2.3.2.1 11-Nov-2006  bouyer Pull up following revision(s) (requested by reinoud in ticket #1561):
sys/netinet/tcp_sack.c: revision 1.20
Fix alignment problems causing regular panics in tpc_sack_option on
NetBSD/alpha and NetBSD/sparc. This fixes PR#34751.
The problem most likely started to show in gcc4 and is caused by the use of
a casting to an uint32_t pointer that is later copied from using memcpy.
Gcc detects the copying of 4 bytes from an uint32_t pointer and decides to
just replace it with an aligned copy causing the trap.
Fix provided by Izumi Tsutsui and ok'd by Martin.
 1.11.2.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.11.2.1 18-Mar-2005  yamt file tcp_sack.c was added on branch yamt-km on 2005-03-19 08:36:38 +0000
 1.12.2.2 29-Apr-2005  kent sync with -current
 1.12.2.1 05-Apr-2005  kent file tcp_sack.c was added on branch kent-audio2 on 2005-04-29 11:29:34 +0000
 1.13.2.2 03-Sep-2007  yamt sync with head.
 1.13.2.1 30-Dec-2006  yamt sync with head.
 1.14.22.1 22-Oct-2006  yamt sync with head
 1.14.20.1 18-Nov-2006  ad Sync with head.
 1.14.4.1 05-Feb-2006  rpaulo <netinet6/in6_pcb.h> went away. Bye!
 1.22.8.1 13-Mar-2007  ad Sync with head.
 1.22.4.1 24-Mar-2007  yamt sync with head.
 1.23.38.2 20-Jun-2009  yamt sync with head
 1.23.38.1 16-May-2008  yamt sync with head.
 1.23.36.1 18-May-2008  yamt sync with head.
 1.23.34.1 02-Jun-2008  mjf Sync with HEAD.
 1.24.14.1 23-Jul-2009  jym Sync with HEAD.
 1.25.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.25.4.1 21-Apr-2011  rmind sync with head
 1.27.6.1 18-Feb-2012  mrg merge to -current.
 1.27.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.27.2.1 17-Apr-2012  yamt sync with head
 1.28.10.1 18-May-2014  rmind sync with head
 1.28.6.2 03-Dec-2017  jdolecek update from HEAD
 1.28.6.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.29.4.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.30.2.3 05-Feb-2017  skrll Sync with HEAD
 1.30.2.2 22-Sep-2015  skrll Sync with HEAD
 1.30.2.1 06-Apr-2015  skrll Sync with HEAD
 1.32.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.33.14.2 21-May-2018  pgoyette Sync with HEAD
 1.33.14.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.17 25-Jul-2014  ryo fix some case of reference to uninitialized tp->snd_fack.
This bug causes dropping FIN mistekenly.
pointed out in PR/48283 by YASUOKA Masahiko, thanks!
 1.16 10-Dec-2005  elad branches: 1.16.120; 1.16.136;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.15 05-Apr-2005  kurahone branches: 1.15.2;
Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.

Idea taken from FreeBSD.
 1.14 16-Feb-2005  briggs branches: 1.14.4;
Initialize snd_high as part of tcp_sendseqinit().
From Kentaro A. Kurahone.
 1.13 07-Aug-2003  agc branches: 1.13.8; 1.13.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.12 04-Oct-1998  matt branches: 1.12.46;
Adapt the NEWRENO changes from the UCSB diffs of BSDI 3.0's TCP
to NetBSD. Ignore the SACK & FACK stuff for now.
 1.11 04-Sep-1998  mycroft Make the randomized part of the ISS 24 bits.
 1.10 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.9 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.8 13-Oct-1997  explorer o Make usage of /dev/random dependant on
pseudo-device rnd # /dev/random and in-kernel generator
in config files.

o Add declaration to all architectures.

o Clean up copyright message in rnd.c, rnd.h, and rndpool.c to include
that this code is derived in part from Ted Tyso's linux code.
 1.7 10-Oct-1997  explorer Add hooks to use the kernel random system to generate TCP sequence numbers.
 1.6 26-Mar-1995  jtc branches: 1.6.14;
KERNEL -> _KERNEL
 1.5 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.6.14.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.12.46.6 11-Dec-2005  christos Sync with head.
 1.12.46.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.12.46.4 17-Feb-2005  skrll Sync with HEAD.
 1.12.46.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.12.46.2 18-Sep-2004  skrll Sync with HEAD.
 1.12.46.1 03-Aug-2004  skrll Sync with HEAD
 1.13.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.13.8.1 29-Apr-2005  kent sync with -current
 1.14.4.1 06-May-2005  tron Pull up revision 1.15 (requested by kurahone in ticket #199):
Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.
Idea taken from FreeBSD.
 1.15.2.1 21-Jun-2006  yamt sync with head.
 1.16.136.1 10-Aug-2014  tls Rebase.
 1.16.120.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.298 26-Feb-2025  andvar Fix typos in comments, mainly s/calcurate/calculate/.
 1.297 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.296 04-Nov-2022  ozaki-r branches: 1.296.8;
inpcb: rename functions to in6pcb_*
 1.295 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.294 31-Oct-2022  ozaki-r tcp: fix wrong logic in tcp_drop

Pointed out by mlelstv@
 1.293 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.292 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.291 20-Sep-2022  ozaki-r tcp: separate syn cache stuffs into tcp_syncache.[ch] files

No functional change.
 1.290 27-Jun-2022  knakahara Remove confusable comment.

The comment was added in tcp_subr.c:r1.124 (2002/03/15).
tcp_drain() is called from softint context only, now.
 1.289 31-Jul-2021  andvar s/threshhold/threshold
 1.288 09-Mar-2021  christos branches: 1.288.4;
Move the offset addition in one place and mask the random generated value
to make sure that the isn is monotonic.
 1.287 08-Mar-2021  christos Remove the unused "addin" argument (it was always 0) and go back using
a random iss by default (instead of rfc1948)
 1.286 08-Mar-2021  christos Don't increment the iss sequence on each connection because it exposes
information (Amit Klein)
 1.285 07-Mar-2021  christos netinet: Enable RFC 1948 pseudorandom TCP ISS selection by default.
(from riastradh)
 1.284 12-Jun-2020  roy branches: 1.284.2;
Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.283 06-Aug-2019  riastradh Clamp tcp timer quantities to reasonable ranges.

Reported-by: syzbot+259675123340bf46a6de@syzkaller.appspotmail.com
 1.282 27-Dec-2018  maxv branches: 1.282.4;
Remove unused arguments.
 1.281 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.280 23-May-2018  maxv branches: 1.280.2;
Add XXX.
 1.279 03-May-2018  maxv Remove now unused tcpip.h includes. Some were already unused before.
 1.278 18-Apr-2018  maxv Remove unused netipsec/xform.h includes.
 1.277 18-Apr-2018  maxv Remove misleading comments.
 1.276 29-Mar-2018  maxv Remove TCPREASS_DEBUG. It was introduced 20 years ago when the reassembler
was being developed, but it's irrelevant today. Makes the code clearer.
 1.275 29-Mar-2018  maxv Misc changes; no real functional change.
 1.274 29-Mar-2018  maxv Remove #ifdef INET. Same as tcp_input.c. Makes the code easier to
understand.

Also make tcp6_mtudisc() static in tcp_subr.c.
 1.273 26-Feb-2018  maxv branches: 1.273.2;
Dedup: merge ipsec4_hdrsiz and ipsec6_hdrsiz into ipsec_hdrsiz.

ok ozaki-r@
 1.272 19-Jan-2018  ozaki-r Run tcp_slowtimo in workqueue if NET_MPSAFE

If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.

NFCI for !NET_MPSAFE
 1.271 29-Jul-2017  maxv Forgot to commit this file yesterday.
 1.270 03-Mar-2017  ozaki-r branches: 1.270.6;
Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.269 02-Jan-2017  christos branches: 1.269.2;
Fix TCP signature code:
1. pack options more tightly instead of being generous with no/op
2. put TCP_SIGNATURE option before SACK
3. fix computation of options length, by deferring it
XXX: Really we should move the options setting code in one place instead
of having two copies one for input and one for output.
XXX: tcp_optlen/tcp_hdrsiz need to be fixed; they were wrong before too.
 1.268 08-Dec-2016  ozaki-r Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.267 09-Nov-2016  ozaki-r Cleanup/KNF tcp6_mtudisc

No functional change.
 1.266 10-Jun-2016  ozaki-r branches: 1.266.2;
Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.265 15-Feb-2016  rtr Reduce code duplication.

Split creation of IPv4-Mapped IPv6 addresses into its own function
and use it.

No functional change intended. As posted to tech-net@
 1.264 07-Sep-2015  ozaki-r Refactor tcp_mtudisc

No functional change.
 1.263 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.262 19-May-2015  kefren Use RUN_ONCE to initialize iss secret. Suggested by riastradh@
 1.261 16-May-2015  kefren Don't overexpose tcp_iss_secret and don't bother compute it unless
RFC1948 compliance is activated
 1.260 27-Apr-2015  ozaki-r Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp

It consolidates a scattered routine:
(rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
 1.259 13-Apr-2015  riastradh cprng_strong(kern_cprng, ...) never blocks, pass 0 for flags.

FASYNC was wrong anyway! It's FNONBLOCK.
 1.258 14-Feb-2015  he Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).
 1.257 10-Nov-2014  maxv branches: 1.257.2;
Do not uselessly include <sys/malloc.h>.
 1.256 05-Sep-2014  matt Don't use C++ keyword (template) as variable.
 1.255 16-Mar-2014  dholland branches: 1.255.4;
Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
 1.254 02-Jan-2014  pooka need atomic.h, from uwe
 1.253 02-Jan-2014  pooka Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
 1.252 23-Nov-2013  christos convert from CIRCLEQ to TAILQ.
 1.251 12-Nov-2013  kefren * implement TCP CUBIC congestion control algorithm
* move tcp_sack_newack bits inside reno and newreno_fast_retransmit_newack
* notify ECN peer about cwnd shrink in [new]reno_slow_retransmit

Based on the patch proposed on tech-net@ on Nov 7 with minor improvments:
* adapt wmax for no-fast convergence case
* correct cbrt calculation for big window sizes (>750KB)
 1.250 05-Jun-2013  christos branches: 1.250.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.249 10-Apr-2013  christos Limit the tcp initial window setting to 10, leaving it by default to 4
and simplifying the code in process. Per draft-ietf-initcwnd-08.txt.
 1.248 08-Sep-2012  msaitoh branches: 1.248.2;
Fix a bug that kmem_alloc() is called from the interrupt context.
 1.247 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.246 31-Dec-2011  christos branches: 1.246.2;
- fix offsetof usage, and redundant defines
- kill pointer casts to 0
 1.245 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.244 17-Dec-2011  tls Separate /dev/random pseudodevice implemenation from kernel entropy pool
implementation. Rewrite pseudodevice code to use cprng_strong(9).

The new pseudodevice is cloning, so each caller gets bits from a stream
generated with its own key. Users of /dev/urandom get their generators
keyed on a "best effort" basis -- the kernel will rekey generators
whenever the entropy pool hits the high water mark -- while users of
/dev/random get their generators rekeyed every time key-length bits
are output.

The underlying cprng_strong API can use AES-256 or AES-128, but we use
AES-128 because of concerns about related-key attacks on AES-256. This
improves performance (and reduces entropy pool depletion) significantly
for users of /dev/urandom but does cause users of /dev/random to rekey
twice as often.

Also fixes various bugs (including some missing locking and a reseed-counter
overflow in the CTR_DRBG code) found while testing this.

For long reads, this generator is approximately 20 times as fast as the
old generator (dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of
2.5MB/sec) and also uses a separate mutex per instance so concurrency
is greatly improved. For reads of typical key sizes for modern
cryptosystems (16-32 bytes) performance is about the same as the old
code: a little better for 32 bytes, a little worse for 16 bytes.
 1.243 19-Nov-2011  tls branches: 1.243.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.242 31-Oct-2011  yamt branches: 1.242.2;
tcp_drain: grab softnet_lock where appropriate
 1.241 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.240 03-May-2011  dyoung *_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
 1.239 20-Apr-2011  gdt Rewrite comments about TCP RTO calculations.

Long ago, the storage representations of srtt and rttvar were changed
from the 4.4BSD scheme, and the comments are out of sync with the
code. This commit rewrites most of the comments that explain the RTO
calculations, and points out some issues in the code.

Joint work with Bev Schwartz of BBN (original analysis and comments),
but I have rewritten and extended them, so errors are mine.

This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073. Approved for Public
Release, Distribution Unlimited
 1.238 16-Sep-2009  pooka branches: 1.238.4; 1.238.6;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.237 27-May-2009  pooka POOL_INIT -> pool_init
 1.236 18-Mar-2009  cegger bzero -> memset
 1.235 18-Mar-2009  cegger bcmp -> memcmp
 1.234 29-Jan-2009  pooka branches: 1.234.2;
stinkset purge: POOL_INIT -> pool_init
also, make the syncache pool static in scope
 1.233 13-Oct-2008  pooka branches: 1.233.2;
POOL_INIT -> pool_init
 1.232 10-Oct-2008  ad tcp_close: rearrange sequence of events slightly to make this atomic.
It was possible for a half-destroyed tcpcb to be visble, as softnet_lock
was being dropped.
 1.231 02-May-2008  ad branches: 1.231.2; 1.231.6;
PR kern/38497 Out of memory allocating ksiginfo

Work around: don't acquire softnet_lock in protocol drain routines.
 1.230 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.229 26-Apr-2008  yamt branches: 1.229.2;
tcp_init: don't forget to allocate tcpstat_percpu.
 1.228 24-Apr-2008  ad Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.227 12-Apr-2008  thorpej branches: 1.227.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.226 08-Apr-2008  thorpej Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.
 1.225 27-Mar-2008  cube - Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.
 1.224 29-Feb-2008  matt Rework tcp congctl selection code so that the congctl entries can be const.
Don't access tcp_congctl stuff outside of tcp_congctl.c, use routines to
update t_congctl. This code is slightly now more complicated.
 1.223 27-Feb-2008  matt Convert stragglers to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.222 05-Feb-2008  yamt branches: 1.222.2; 1.222.6;
- start tcp timestamp from 1 instead of 0.
- add a comment to explain why:
+ * We start with 1, because 0 doesn't work with linux, which
+ * considers timestamp 0 in a SYN packet as a bug and disables
+ * timestamps.
 1.221 14-Jan-2008  dyoung Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase
in in_losing().
 1.220 20-Dec-2007  martin A few missing ifdefs to make non-INET6 kernels build again.
 1.219 20-Dec-2007  dyoung Poison struct route->ro_rt uses in the kernel by changing the name
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.

Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.

Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.

Make ND6_HINT an inline, lowercase subroutine, nd6_hint.

I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
 1.218 02-Aug-2007  rmind branches: 1.218.4; 1.218.10; 1.218.12; 1.218.16; 1.218.20;
TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.217 19-Jul-2007  dyoung Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.216 09-Jul-2007  ad branches: 1.216.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.215 20-Jun-2007  christos - per socket keepalive settings
- settable connection establishment timeout
 1.214 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.213 12-Mar-2007  ad branches: 1.213.2;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.212 04-Mar-2007  christos branches: 1.212.2;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.211 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.210 10-Feb-2007  degroote branches: 1.210.2;
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic
 1.209 06-Dec-2006  yamt add some more tcp mowners.
 1.208 16-Nov-2006  christos branches: 1.208.2; 1.208.4;
__unused removal on arguments; approved by core.
 1.207 19-Oct-2006  yamt implement RFC3465 appropriate byte counting.
from Kentaro A. Kurahone, with minor adjustments by me.
the ack prediction part of the original patch was omitted because
it's a separate change. reviewed by Rui Paulo.
 1.206 17-Oct-2006  dogcow now that we have -Wno-unused-parameter, back out all the tremendously ugly
code to gratuitously access said parameters.
 1.205 13-Oct-2006  dogcow more unused variable fallout.
 1.204 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.203 10-Oct-2006  dogcow change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)
 1.202 09-Oct-2006  rpaulo Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
 1.201 07-Oct-2006  yamt revert tcp_close part of tcp_subr.c rev.1.200 because it's unnecessary.
all callers of tcp_close are at splsoftnet already:
tcp_close
tcp_input ok
tcp_disconnect
tcp_usrreq ok
tcp_usrclosed
tcp_usrreq ok
tcp_disconnect
tcp_timer_2msl ok
tcp_drop
tcp_usrreq
tcp_disconnect
tcp_timer_rexmt ok
tcp_timer_persist ok
tcp_timer_keep ok
tcp_input
syn_cache_get
tcp_input
 1.200 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.199 05-Sep-2006  rpaulo branches: 1.199.2; 1.199.4;
Import of TCP ECN algorithm for congestion control.
Both available for IPv4 and IPv6.
Basic implementation test results are available at
http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.

Work sponsored by the Google Summer of Code project 2006.
Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their
help, comments and support during the project.
 1.198 15-Apr-2006  christos Coverity CID 1149: Add KASSERT before deref.
 1.197 15-Apr-2006  christos Coverity CID 1148: Add KASSERT before deref.
 1.196 11-Dec-2005  christos branches: 1.196.4; 1.196.6; 1.196.8; 1.196.10; 1.196.12;
merge ktrace-lwp.
 1.195 10-Aug-2005  yamt move {tcp,udp}_do_loopback_cksum back to tcp/udp
so that they can be referenced by ipv6.
 1.194 10-Aug-2005  yamt device independent part of ipv6 rx checksum offloading.
 1.193 20-Jul-2005  he Make this build without INET6.
 1.192 19-Jul-2005  christos Implement PMTU checks from:

http://www.gont.com.ar/drafts/icmp-attacks-against-tcp.html

1. Don't act on ICMP-need-frag immediately if adhoc checks on the
advertised MTU fail. The MTU update is delayed until a TCP retransmit
happens.
2. Ignore ICMP Source Quench messages meant for TCP connections.

From OpenBSD.
 1.191 29-May-2005  christos branches: 1.191.2;
- add const
- remove bogus casts
- avoid nested variables
 1.190 18-Apr-2005  yamt fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.
 1.189 05-Apr-2005  kurahone Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.

Idea taken from FreeBSD.
 1.188 29-Mar-2005  yamt protect tcpipqent with splvm.
 1.187 16-Mar-2005  yamt branches: 1.187.2;
simplify data receiver side sack processing.
- introduce t_segqlen, the number of segments in segq/timeq.
the name is from freebsd.
- rather than maintaining a copy of sack blocks (rcv_sack_block[]),
build it directly from the segment list when needed.
 1.186 16-Mar-2005  yamt - use full sized segments unless we actually have SACKs to send.
- avoid TSO duplicate D-SACK.
- send SACKs regardless of TF_ACKNOW.
- don't clear rcv_sack_num when transmitting.

discussed on tech-net@.
 1.185 09-Mar-2005  simonb s/quence/quench/.
 1.184 09-Mar-2005  simonb Add an extra `i' to notifes/notifed.
 1.183 28-Feb-2005  jonathan Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over. Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting both
Kentaro's wired-Ethernet NIC and in my two (different) WiFi NICs.
 1.182 26-Feb-2005  perry nuke trailing whitespace
 1.181 16-Feb-2005  briggs Initialize t_partialacks in the tcpcb template.
From Kentaro A. Kurahone.
 1.180 12-Feb-2005  heas ntohs->htons for ip6 plen (payload length).
It is not technically necessary to set plen here, since ip6_output() starts
off by calculating it, but leaving it keeps it consistent with other code.
 1.179 03-Feb-2005  perry ANSIfy function declarations
 1.178 02-Feb-2005  perry de-__P -- will ANSIfy .c files later.
 1.177 03-Jan-2005  heas branches: 1.177.2; 1.177.4;
In tcp_respond(), clear the m_pkthdr.csum_flags that was inherited from the
received packet so that the checksum is not performed twice. Also,
tcp_respond() does not fill-in the m_pkthdr.csum_data, so a h/w checksum may
have the wrong offset.

OK from Jason Thorpe.
 1.176 19-Dec-2004  christos yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.
 1.175 17-Dec-2004  christos Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out
 1.174 15-Dec-2004  thorpej Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.173 15-Sep-2004  yamt fix ipqent pool corruption problems. make tcp reass code use
its own pool of ipqent rather than sharing it with ip reass code.
PR/24782.
 1.172 18-May-2004  itojun fix MD5 signature support to actually validate inbound signature, and
drop packet if fails.
 1.171 01-May-2004  matt Use EVCNT_ATTACH_STATIC{,2}
 1.170 26-Apr-2004  itojun zero-clear ip6?pseudo before use
 1.169 26-Apr-2004  itojun declare ip6_hdr_pseudo (for kernel only) and use it for TCP MD5 signature
 1.168 26-Apr-2004  itojun sync comment with reality
 1.167 26-Apr-2004  itojun make TCP MD5 signature work with KAME IPSEC (#define IPSEC).

support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream
for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the
right thing).

XXX
current TCP MD5 signature code has giant flaw:
it does not validate signature on input (can't believe it! what is the point?)
 1.166 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.165 25-Apr-2004  simonb Initialise (most) pools from a link set instead of explicit calls
to pool_init. Untouched pools are ones that either in arch-specific
code, or aren't initialiased during initial system startup.

Convert struct session, ucred and lockf to pools.
 1.164 22-Apr-2004  tls Change the default state of two tunables; bring our TCP a little bit
closer to normal behaviour for the current century.

New Reno is now on by default (which is really the only reasonable
choice, since we don't do SACK); instead of an initial window of 1
for non-local nets, we now use Sally Floyd's magic 4K rule.
 1.163 20-Apr-2004  itojun - respond to RST by ACK, as suggested in NISCC recommendation
- rate-limit ACKs against RSTs and SYNs
 1.162 17-Apr-2004  christos adjust to the sbreserve prototype change.
 1.161 05-Apr-2004  christos PR/22551: Invoking tcpcb's get erroneously free'd resulting in to_ticks <= 0
assertion. Approved by he.
 1.160 07-Jan-2004  matt branches: 1.160.2;
When accepting a peer's MSS, never let it drop below 256 (SLIP + TCP will
be the lowest MSS we should ever enounter).
 1.159 27-Oct-2003  thorpej - Change callout_setfunc() to require that the callout handle is already
initialized. Update the txp(4) to compensate.
- Statically initialize the TCP timer callout handles in the tcpcb
template. We still use callout_setfunc(), but that call is now much
less expensive. Add a comment that the compiler is likely to unroll
the loop (so don't sweat that it's there).
 1.158 25-Oct-2003  christos initialize off
 1.157 22-Oct-2003  thorpej Oops, a little to aggressive in the previous patch; TCP_TIMER_INIT()
still needs to be in tcp_newtcpcb(), for now. Pointed out by enami.
 1.156 22-Oct-2003  thorpej Rather than zeroing a tcpcb structure and filling in all the fields
individually, create a tcpcb template pre-initialized (and pre-zero'd)
with the static and mostly-static tcpcb parameters. The template is
now copied into the new tcpcb, which zeros and initializes most of the
tcpcb in one pass. The template is kept up-to-date as TCP sysctl
variables are changed.

Combined with the previous sb_max change, TCP socket creation is now
25% faster.
 1.155 21-Oct-2003  thorpej Add event counters that measure FAST_MBSEARCH.
 1.154 25-Sep-2003  mycroft Fix glaring errors in recent changes.
 1.153 08-Sep-2003  itojun initialize ip_hl for ipsec policy lookup. PR kern/22715
 1.152 06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.151 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.150 22-Aug-2003  itojun tp could be null in tcp_respond()
 1.149 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.148 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.147 22-Aug-2003  jonathan Replace the set_socket() method of passing an extra struct socket*
argument to ip6_output() with a new explicit struct in6pcb* argument.
(The underlying socket can be obtained via in6pcb->inp6_socket.)

In preparation for fast-ipsec. Reviewed by itojun.
 1.146 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.145 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.144 20-Jul-2003  he As a temporary workaround, apply the fix from PR#20390, thereby
cooperating with the callout code in working around the race
condition caused by the TCP code's use of the callout facility.

Instead of unconditionally releasing memory in tcp_close() and
SYN_CACHE_PUT(), check whether any of the related callout handlers
are about to be invoked (but have not yet done callout_ack()), and
if so, just mark the associated data structure (tcpcb or syn cache
entry) as "dead", and test for this (and release storage) in the
callout handler functions.
 1.143 03-Jul-2003  ragge Make it possible to set TCP_INIT_WIN and TCP_INIT_WIN_LOCAL in the config
file as options.
 1.142 29-Jun-2003  fvdl branches: 1.142.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.141 29-Jun-2003  ragge Add code to remember where in the send queue of mbufs the last packet was
sent from. This change avoid a linear search through all mbufs when using
large TCP windows, and therefore permit high-speed connections on long
distances.

Tested on a 1 Gigabit connection between Lule� and San Francisco, a distance
of about 15000km. With TCP windows of just over 20 Mbytes it could keep up
with 950Mbit/s.

After discussions with Matt Thomas and Jason Thorpe.
 1.140 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.139 01-Mar-2003  thorpej Allow TCP connections to hosts on a local network to use a larger
slow start initial window. Default this larger initial window to
4 packets, allowing it to be adjusted with net.inet.tcp.init_win_local.
 1.138 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.137 24-Nov-2002  scw Quell an uninitialised variable warning.
 1.136 22-Oct-2002  lukem fix typo in previous: s/tip/top/
 1.135 22-Oct-2002  simonb Micro-optimisation: don't check if the high bit is set and then mask it
off - just mask it off anyways. Saves a branch 50% of the time.
 1.134 25-Sep-2002  itojun minor KNF
 1.133 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.132 01-Jul-2002  itojun check AF_INET6 socketes when IPv4 "too big" messages arrive.
PR 17448
 1.131 09-Jun-2002  itojun whitespace
 1.130 29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.129 28-May-2002  itojun use arc4random() on tcp iss generation
 1.128 26-May-2002  itojun path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
 1.127 12-May-2002  matt branches: 1.127.2; 1.127.4;
Eliminate commons.
 1.126 07-May-2002  matt Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.
 1.125 27-Apr-2002  thorpej * Instrument tcp_build_datapkt().
* Remove the code that allocates a cluster if the packet would
fit in one; it totally defeats doing references to M_EXT mbufs
in the socket buffer. This drastically reduces the number of
data copies in the tcp_output() path for applications which use
large writes. Kudos to Matt Thomas for pointing me in the right
direction.
 1.124 15-Mar-2002  itojun have tcp6_drain
 1.123 08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.122 13-Nov-2001  lukem add RCSIDs
 1.121 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.120 04-Nov-2001  matt Change a few variable/tables to const since they are read-only.
 1.119 11-Sep-2001  thorpej branches: 1.119.2;
Use callouts for SYN cache timers, rather than traversing time queues
in tcp_slowtimo().
 1.118 10-Sep-2001  thorpej Use callouts for TCP timers, rather than traversing the list of
all open TCP connections in tcp_slowtimo() (which is called 2x
per second). It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
 1.117 10-Sep-2001  thorpej Initialize TCP timer variables in a new function, tcp_timer_init().
 1.116 10-Sep-2001  thorpej Add explicit initialization of TCP timer state. A noop right now.
 1.115 10-Sep-2001  thorpej Use a callout for the delayed ACK timer, and delete tcp_fasttimo().
Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
 1.114 23-Jul-2001  itojun branches: 1.114.2;
wrap IPv6 code by #ifdef INET6
 1.113 23-Jul-2001  itojun use in6_maxmtu, not in_maxmtu, for IPv6 mss computation
 1.112 12-Jun-2001  wiz branches: 1.112.2;
receive, not recieve
 1.111 02-Jun-2001  thorpej Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.110 24-May-2001  itojun call icmp6_mtudisc_update(foo, 0) even if ICMPv6 messages are very short.
let icmp6 layer decide whether we take PMTUD routes or not.
 1.109 21-Mar-2001  chs make this compile without rnd.
 1.108 20-Mar-2001  thorpej Two changes, designed to make us even more resilient against TCP
ISS attacks (which we already fend off quite well).

1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic
hash method of generating TCP ISS values. Note, this code is experimental
and disabled by default (experimental enough that I don't export the
variable via sysctl yet, either). There are a couple of issues I'd
like to discuss with Steve, so this code should only be used by people
who really know what they're doing.

2. Per a recent thread on Bugtraq, it's possible to determine a system's
uptime by snooping the RFC1323 TCP timestamp options sent by a host; in
4.4BSD, timestamps are created by incrementing the tcp_now variable
at 2 Hz; there's even a company out there that uses this to determine
web server uptime. According to Newsham's paper "The Problem With
Random Increments", while NetBSD's TCP ISS generation method is much
better than the "random increment" method used by FreeBSD and OpenBSD,
it is still theoretically possible to mount an attack against NetBSD's
method if the attacker knows how many times the tcp_iss_seq variable
has been incremented. By not leaking uptime information, we can make
that much harder to determine. So, we avoid the leak by giving each
TCP connection a timebase of 0.
 1.107 11-Feb-2001  itojun branches: 1.107.2;
pull latest kame pcbnotify code. synchronizes ICMPv6 path mtu discovery
behavior with other protocols (i.e. validation, use of hiwat/lowat).
 1.106 24-Jan-2001  itojun - record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.105 21-Dec-2000  itojun fix call to in6_pcbnotify. s/EMSGSIZE/PRC_MSGSIZE/.
 1.104 09-Dec-2000  itojun update icmp6 too big validation. the change is necessary since pmtud is
mandatory for IPv6 (so we can't just validate by using connected pcb - we need
to allow traffic from unconnected pcb to do pmtud).
- if the traffic is validated by xx_ctlinput, allow up to "hiwat" pmtud
route entries.
- if the traffic was not validated by xx_ctlinput, allow up to "lowat" pmtud
route entries (there's upper limit, so bad guys cannot blow up our routing
table).
sync with kame

XXX need to think again about default hiwat/lowat value.
XXX victim selection to help starvation case
 1.103 29-Oct-2000  itojun cleanup tcp_drop
 1.102 29-Oct-2000  itojun process IPv4 tcp RST packet right. reported by thorpej.
 1.101 19-Oct-2000  itojun remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
 1.100 19-Oct-2000  itojun memcpy -> bcopy, for sync with kame tree
 1.99 18-Oct-2000  itojun verify ICMPv6 too big messages based on TCP pcbs, and/or IPsec SA.
TODO: udp6, and sendto consideration. as pmtud is mandatory for IPv6,
it is rather important for us to support those cases.
TODO: more testing
TODO: kame sync
 1.98 18-Oct-2000  thorpej Restructure the Path MTU Discovery code somewhat to avoid
entering rtentry's for hosts we're not actually communicating
with.

Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
* TCP -- Lookup the connection based on the address/port
pairs in the ICMP message.
* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.

If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes. For TCP, this is where we now
refresh cached routes and re-enter slow-start.

As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.

XXX Note, this is only a fix for the IPv4 case. For the IPv6
XXX case, we need to wait for the KAME folks.

Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
 1.97 18-Oct-2000  itojun move tcp syn cache parameters from in_proto.c to tcp_subr.c.
it makes more sense and helps INET6-only (INET-less) build.
 1.96 17-Oct-2000  itojun be more friendly with INET-less build.
XXX we need to do more to do a working INET-less build
 1.95 17-Oct-2000  thorpej Add an IP_MTUDISC flag to the flags that can be passed to
ip_output(). This flag, if set, causes ip_output() to set
DF in the IP header if the MTU in the route is not locked.

This allows a bunch of redundant code, which I was never
really all that happy about adding in the first place, to
be eliminated.

Inspired by a similar change made by provos@openbsd.org when
he integrated NetBSD's Path MTU Discovery code into OpenBSD.
 1.94 13-Oct-2000  itojun validate mbuf chain length on *_ctlinput. remote node may be able to
transmit a truncated icmp6 packet and panic the system. sync with kame.
 1.93 19-Sep-2000  itojun for t_template, allocate mbuf cluster only if really necessary.
this avoids too aggressive memory usage on heavy load web server, for example.
From: Kevin Lahey <kml@dotrocket.com>

release and reallocate t_template, if t_template->m_len changes.
(this happens if we connect to IPv4 mapped destination and then IPv6
destination, on a single AF_INET6 socket)

KAME 1.26 -> 1.28
 1.92 30-Jun-2000  itojun remove old mbuf assumption (ip header and tcp header are on the same mbuf).
this is for m_pulldown use. (sync with kame)
 1.91 30-Mar-2000  augustss branches: 1.91.4;
Remove register declarations.
 1.90 30-Mar-2000  simonb Delete redundant decl of zeroin6_addr, it's in <netinet6/in6_var.h>.
 1.89 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.88 29-Feb-2000  itojun ensure tcp window size does not overflow (16bit unsigned after window scale).
FreeBSD PR: 16914
 1.87 06-Feb-2000  itojun don't chase mbuf pointer when it is NULL.
 1.86 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.85 15-Dec-1999  itojun do not overwrite traffic class field when we write IPv6 version field.
 1.84 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.83 12-Dec-1999  ragge Avoid GCC complaints (under some circumstances).
 1.82 08-Dec-1999  itojun do not drop from IP header to tcp option until sbappend(), to reduce
requirement to mbuf chain.
part of KAME sync, committed separately for its (possible) impact.
 1.81 23-Sep-1999  enami branches: 1.81.2; 1.81.8;
Make this compile without INET6.
 1.80 23-Sep-1999  itojun cleanup and correct TCP MSS consideration with IPsec headers.

MSS advertisement must always be:
max(if mtu) - ip hdr siz - tcp hdr siz
We violated this in the previous code so it was fixed.

tcp_mss_to_advertise() now takes af (af on wire) as its argument,
to compute right ip hdr siz.

tcp_segsize() will take care of IPsec header size.
One thing I'm not really sure is how to handle IPsec header size in
*rxsegsizep (inbound segment size estimation).
The current code subtracts possible *outbound* IPsec size from *rxsegsizep,
hoping that the peer is using the same IPsec policy as me.
It may not be applicable, could TCP gulu please comment...
 1.79 27-Aug-1999  itojun fix tcp mss consideration on ipsec operation.
now tcp-over-ipsec should not experience fragmentation due to
addition of ipsec header.

From: proff@suburbia.net (Julian Assange)
 1.78 25-Aug-1999  itojun When listening socket goes away, remove assockated syn cache entires.
Stale syn cache entries are useless because none of them will be used
if there is no listening socket, as tcp_input looks up listening socket by
in_pcblookup*() before looking into syn cache.

This fixes race condition due to dangling socket pointer from syn cache
entries to listening socket (this was introduced when ipsec is merged in).

This should preserve currently implemented behavior (but not 4.4BSD
behavior prior to syn cache).

Tested in KAME repository before commit, but we'd better run some
regression tests.
 1.77 25-Aug-1999  itojun ctlinput handling must look at ip6_src, not ip6_dst.
(this makes path mtu handling wrong)
 1.76 09-Aug-1999  itojun return with doing nothing from xx_ctlinput(), when sa->sa_family
is not the expected one.

I see PRC_REDIRECT_HOST with sa->sa_family == AF_UNIX coming to
{tcp,udp}_ctlinput() when I use dhclient, and I feel like adding
more sanity checks, without logging - if we log it it is too noisy.
 1.75 31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.74 23-Jul-1999  itojun do not include unnecessary include files.
 1.73 22-Jul-1999  itojun - implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.
 1.72 14-Jul-1999  itojun Use proper ip protocol # field and tcp hdr on sending RST against SYN,
when ip header and tcp header are not adjacent to each other
(i.e. when ip6 options are attached).

To test this, try
telnet @::1@::1 port
toward a port without responding server. Prior to the fix, the kernel will
generate broken RST packet.
 1.71 14-Jul-1999  drochner make sending of keepalive messages work again:
-remove bogus sanity check involving an uninitialized variable
-correct mbuf cluster allocation
-(non-critical) remove redundant check in cleanup after error
 1.70 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.69 02-Jul-1999  fvdl Fix for -Wunitialized warnings broke compiles without INET6, refix.
 1.68 02-Jul-1999  itojun avoid "variable not initialized" warnings on some of the platforms.
 1.67 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.66 28-Feb-1999  explorer branches: 1.66.4; 1.66.6;
Don't mix in data just to stir the rnd pool. Extracting data will do that,
any network packets received might, too, so this is already taken care of.
 1.65 26-Jan-1999  thorpej Fix a slight error in previous. Rearrange some code in tcp_respond() so
that a DIAGNOSTIC check against the destination address is actually
checking the destination address. "oops."
 1.64 20-Jan-1999  thorpej Fix a problem pointed out by Charles Hannum; DF wasn't being set in
SYN,ACK packets during Path MTU Discovery. Fix tcp_respond() to do the
appropriate route lookup and set DF as appropriate.

Also, fixup similar code in tcp_output() to relookup the route if it
is down.
 1.63 18-Dec-1998  thorpej Add a lock around the TCPCB's sequence queue, to prevent tcp_drain()
from corrupting the queue if called from a device's interrupt context.

Similar in nature to the problem reported in PR #5684.
 1.62 08-Oct-1998  thorpej Use the pool allocator for ipqent structures.
 1.61 07-Oct-1998  thorpej Use the pool allocator for the tcpcb's TCP/IP header template.
 1.60 06-Oct-1998  matt Add a sysctl for newreno (default to off).
 1.59 19-Sep-1998  mycroft Always send a 0 window with a RST. Suggested by Darren Reed.
 1.58 04-Sep-1998  mycroft Fix a couple of bogons related to tcp_new_iss():
* Don't add tcp_iss_seq when creating a new ISS from TIME-WAIT state.
* Do the clock increment even when using the rnd device.
 1.57 02-Aug-1998  thorpej Use the pool allocator for tcpcbs.
 1.56 17-Jul-1998  thorpej Document that we are more conservative after doing MTU discovery than the
suggestion in draft-floyd-incr-init-win-03. Rather than scaling cwnd back
by the ratio of new segment size to old segment size, we perform a slow start
using the Initial Window, computed with the new segment size.
 1.55 17-Jul-1998  thorpej Clarify that we're using the Loss Window when we receive a source quench.
 1.54 12-May-1998  kml Changed initialization of peermss to ensure that it didn't have
the TCP and IP options lengths removed from it -- the IP options can
change over the course of a connection...
 1.53 07-May-1998  kml Change comments on tcp_mss_to_advertise to match actual arguments
 1.52 07-May-1998  thorpej Rework the syn cache code somewhat:
- Don't use home-grown queue manipulation. Use <sys/queue.h> instead. The
data structures are a little larger, but we are otherwise wasting the
memory chunk anyway (we're already a 64-byte malloc bucket).
- Fix a bug in the cache-is-full case: if the oldest element removed from
the first non-empty bucket was the only element in the bucket, the
bucket wouldn't be removed from the bucket cache, causing queue corruption
later.
- Optimize the syn cache timers by using PRT timers rather than home-grown
decrement-and-propagate timers.

This code is now a fair bit smaller, and significantly easier to read
and understand.
 1.51 06-May-1998  thorpej Use macros from tcp_timer.h to manipulate TCP timers, so that their
implementation can be changed easily.
 1.50 03-May-1998  thorpej Once again, move a declaration for the benefit of TUBA (grumble).
 1.49 29-Apr-1998  matt New TCP reassembly code. The new code reduces the memory needed by
out-of-order packets and builds the infrastructure needed for sending
SACK blocks (to be added shortly).
 1.48 29-Apr-1998  thorpej Make use of the work-arounds for ancient broken TCP peers run-time
conditional (tcp_compat_42). The kernel config option TCP_COMPAT_42
will still enable this by default, or disable this by default if the
option is not included (i.e. current behavior). This will be made a
sysctl soon.
 1.47 13-Apr-1998  kml Fix to ensure that the correct MSS is advertised for loopback
TCP connections by using the MTU of the interface. Also added
a knob, mss_ifmtu, to force all connections to use the MTU of
the interface to calculate the advertised MSS.
 1.46 31-Mar-1998  thorpej Fix a potential-congestion case in the larger initial congestion window
code, as clarified in the TCPIMPL WG meeting at IETF #41: If the SYN
(active open) or SYN,ACK (passive open) was retransmitted, the initial
congestion window for the first slow start of that connection must be
one segment.
 1.45 28-Mar-1998  thorpej Remove a comment in tcp_mss_to_advertise() that no longer applies.
 1.44 24-Mar-1998  kml Ensure that we take the IP option length into account when we calculate
the effective maximum send size for TCP. ip_optlen() and tcp_optlen()
should probably be inlined for efficiency.
 1.43 19-Mar-1998  kml Fix a retransmission bug introduced by the Brakmo and Peterson
RTO estimation changes. Under some circumstances it would return a value
of 0, while the old Van Jacobson RTO code would return a minimum of 3.
This would result in 12 retransmissions, each 1 second apart.
This takes care of those instances, and ensures that t_rttmin is
used everywhere as a lower bound.
 1.42 17-Mar-1998  kml Ensure that the TCP segment size reflects the size of TCP options
in the packet. This fixes a bug that was resulting in extra packets
in retransmissions (the second packet would be 12 bytes long,
reflecting the RFC1323 timestamp option size).
 1.41 19-Feb-1998  thorpej Update copyright (sigh, should have done this long ago).
 1.40 30-Jan-1998  mellon Take PCB off delayed ack queue before freeing.
 1.39 12-Jan-1998  scottr Use option header file for TCP_COMPAT_42
 1.38 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.37 31-Dec-1997  thorpej Implement a queue for delayed ACK processing. This queue is used in
tcp_fasttimo() in lieu of scanning all open TCP connections.
 1.36 11-Dec-1997  thorpej Implement an infrastructure to allow larger initial congestion windows.
The sysctl'able variable "tcp_init_win", when set to 0, selects an
auto-tuning algorithm for selecting the initial window, based on transmit
segment size, per discussion in the IETF tcpimpl working group.

Default initial window is still 1 segment, but will soon become 2 segments,
per discussion in tcpimpl.
 1.35 10-Dec-1997  thorpej Implement tcp_drain().
 1.34 11-Nov-1997  kml Remove an extraneous call to rtfree() in the path mtu discovery code;
this was causing negative reference counts on routes...
 1.33 08-Nov-1997  kml TCP MSS fixes to provide cleaner slow-start and recovery.
 1.32 18-Oct-1997  kml branches: 1.32.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc
 1.31 17-Oct-1997  kml Path MTU Discovery support. This is turned off by default.
Use sysctl -w net.inet.icmp.mtudisc=1 to turn on.
Still to come: path removal after some period, black hole detection
 1.30 13-Oct-1997  explorer o Make usage of /dev/random dependant on
pseudo-device rnd # /dev/random and in-kernel generator
in config files.

o Add declaration to all architectures.

o Clean up copyright message in rnd.c, rnd.h, and rndpool.c to include
that this code is derived in part from Ted Tyso's linux code.
 1.29 10-Oct-1997  explorer Add hooks to use the kernel random system to generate TCP sequence numbers.
 1.28 22-Sep-1997  thorpej Fix several annoyances related to MSS handling in BSD TCP:
- Don't overload t_maxseg. Previous behavior was to set it to the min
of the peer's advertised MSS, our advertised MSS, and tcp_mssdflt
(for non-local networks). This breaks PMTU discovery running on
either host. Instead, remember the MSS we advertise, and use it
as appropriate (in silly window avoidance).
- Per last bullet, split tcp_mss() into several functions for handling
MSS (ours and peer's), and performing various tasks when a connection
becomes ESTABLISHED.
- Introduce a new function, tcp_segsize(), which computes the max size
for every segment transmitted in tcp_output(). This will eventually
be used to hook in PMTU discovery.
 1.27 23-Jul-1997  thorpej branches: 1.27.2;
Pull SYN_cache_branch down into the main line.
 1.26 24-Jun-1997  thorpej Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.
 1.25 10-Dec-1996  mycroft branches: 1.25.8;
Fix RTT scaling problems introduced with Brakmo and Peterson changes.
 1.24 15-Sep-1996  mycroft Hash unconnected PCBs.
 1.23 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.22 13-Feb-1996  christos branches: 1.22.4;
netinet prototypes
 1.21 31-Jan-1996  mycroft Build a hash table of PCBs. Hash function needs tweaking.
 1.20 21-Nov-1995  cgd make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.
 1.19 12-Jun-1995  mycroft branches: 1.19.2;
Fix bogon in previous.
 1.18 12-Jun-1995  mycroft Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.
 1.17 12-Jun-1995  mycroft Oops. Make source quench work again.
 1.16 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.15 11-Jun-1995  mycroft As suggested by Brakmo and Peterson:
* Don't add the extra 1/8 of the mss when ramping up the congestion window.
* Scale the RTT values slightly to adjust for rounding errors.
* Set the lower bound of the RTO to RTT+2.
 1.14 04-Jun-1995  mycroft Clean up many more casts.
 1.13 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.12 14-Oct-1994  mycroft Don't return received data to the user until the initial handshake is complete.
Also use TCPS_HAVEESTABLISHED() in a few other places.
 1.11 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.10 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.9 10-Jan-1994  mycroft Change the counters to be all the same type -- u_long.
 1.8 10-Jan-1994  mycroft Should compile now with or without `options MULTICAST'.
 1.7 08-Jan-1994  mycroft Prototypes.
 1.6 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.5 18-Dec-1993  mycroft Canonicalize all #includes.
 1.4 22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.3 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.2 12-Apr-1993  mycroft Ignore forged ICMP_UNREACH with dport==0 and sport==0.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.19.2.1 02-Feb-1996  mycroft Bring in changes for mondo patch 2.
 1.22.4.1 10-Dec-1996  mycroft From trunk:
Fix RTT scaling problems introduced with Brakmo and Peterson changes.
 1.25.8.4 28-Jun-1997  thorpej KNF.
 1.25.8.3 26-Jun-1997  thorpej u_short -> u_int16_t.
 1.25.8.2 26-Jun-1997  thorpej Update from trunk.
 1.25.8.1 14-May-1997  mellon More of David Borman's SYN cache patches for 4.4BSD-lite2:

tcp_respond:

- return a error code if the reponse could not be sent, zero
otherwise (was void).

- generate SYN/ACK packet if tp==0 and SYN bit is set. Do
not adjust window in this case.

- if SYN bit is set, use ti_off as passed in rather than
setting it locally.


tcp_ctlinput:

- call syn_cache_unreach if in_pcbnotify fails.
 1.27.2.2 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.27.2.1 29-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.32.2.8 09-May-1998  mycroft Pull up patch from kml.
 1.32.2.7 05-May-1998  mycroft Pull up 1.42, per request of kml.
 1.32.2.6 05-May-1998  mycroft Pull up 1.43, per request of kml.
 1.32.2.5 07-Feb-1998  mellon Pull up 1.40 (mellon)
 1.32.2.4 29-Jan-1998  mellon Fix botched merge
 1.32.2.3 29-Jan-1998  mellon Pull up 1.35-1.38 (thorpej)
 1.32.2.2 12-Nov-1997  thorpej Pull up from trunk: nuke extra rtfree()
 1.32.2.1 08-Nov-1997  thorpej Pull up from trunk: TCP MSS fixes to provide cleaner slow-start and recovery.
(kml)
 1.66.6.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.66.6.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.66.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.66.4.2 02-Aug-1999  thorpej Update from trunk.
 1.66.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.81.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.81.2.6 27-Mar-2001  bouyer Sync with HEAD.
 1.81.2.5 11-Feb-2001  bouyer Sync with HEAD.
 1.81.2.4 05-Jan-2001  bouyer Sync with HEAD
 1.81.2.3 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.81.2.2 22-Nov-2000  bouyer Sync with HEAD.
 1.81.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.91.4.5 09-Sep-2003  msaitoh Pull up rev. 1.153 (requested by itojun in ticket #78):
Initialize ip_hl for ipsec policy lookup. Fixes PR 22715.
 1.91.4.4 06-Apr-2001  he Pull up revision 1.106 (via patch, requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.91.4.3 17-Oct-2000  tv Pullup 1.94 [itojun]:
validate mbuf chain length on *_ctlinput. remote node may be able to
transmit a truncated icmp6 packet and panic the system. sync with kame.
 1.91.4.2 19-Sep-2000  itojun pullup 1.92 -> 1.93 (requested by thorpej)

for t_template, allocate mbuf cluster only if really necessary.
this avoids too aggressive memory usage on heavy load web server, for example.
From: Kevin Lahey <kml@dotrocket.com>

release and reallocate t_template, if t_template->m_len changes.
(this happens if we connect to IPv4 mapped destination and then IPv6
destination, on a single AF_INET6 socket)

KAME 1.26 -> 1.28
 1.91.4.1 23-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)

remove old mbuf assumption (ip header and tcp header are on the same mbuf).
this is for m_pulldown use. (sync with kame)

1.108 -> 1.109 syssrc/sys/netinet/tcp_input.c
1.56 -> 1.57 syssrc/sys/netinet/tcp_output.c
1.91 -> 1.92 syssrc/sys/netinet/tcp_subr.c
 1.107.2.13 11-Dec-2002  thorpej Sync with HEAD.
 1.107.2.12 11-Nov-2002  nathanw Catch up to -current
 1.107.2.11 18-Oct-2002  nathanw Catch up to -current.
 1.107.2.10 27-Aug-2002  nathanw Catch up to -current.
 1.107.2.9 01-Aug-2002  nathanw Catch up to -current.
 1.107.2.8 20-Jun-2002  nathanw Catch up to -current.
 1.107.2.7 04-May-2002  thorpej Update from trunk.
 1.107.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.107.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.107.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.107.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.107.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.107.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.112.2.7 10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.112.2.6 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.112.2.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.112.2.4 16-Mar-2002  jdolecek Catch up with -current.
 1.112.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.112.2.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.112.2.1 03-Aug-2001  lukem update to -current
 1.114.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.119.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.127.4.5 20-Apr-2004  jmc Pullup patch (requested by itojun in ticket #1680)

If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it. Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window. If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
Respond to RST by ACK, as suggested in NISCC recommendation.
Rate-limit ACKs against RSTs and SYNs.
If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
 1.127.4.4 22-Oct-2003  jmc Pullup rev 1.144 (requested by he in ticket #1530)


Introduce a new INVOKING status for callouts, and use it to close
a race condition in the TCP code. Fixes PR#20390.
 1.127.4.3 09-Sep-2003  tron Pull up revision 1.153 (requested by itojun in ticket #1452):
initialize ip_hl for ipsec policy lookup. PR kern/22715
 1.127.4.2 05-Sep-2003  tron Pull up revision 1.128 (requested by tls in ticket #1445):
path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
 1.127.4.1 02-Jul-2002  lukem Pull up revision 1.132 (requested by itojun in ticket #422):
check AF_INET6 socketes when IPv4 "too big" messages arrive.
PR 17448
 1.127.2.4 29-Aug-2002  gehenna catch up with -current.
 1.127.2.3 15-Jul-2002  gehenna catch up with -current.
 1.127.2.2 20-Jun-2002  gehenna catch up with -current.
 1.127.2.1 30-May-2002  gehenna Catch up with -current.
 1.142.2.11 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.142.2.10 01-Apr-2005  skrll Sync with HEAD.
 1.142.2.9 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.142.2.8 17-Feb-2005  skrll Sync with HEAD.
 1.142.2.7 15-Feb-2005  skrll Sync with HEAD.
 1.142.2.6 04-Feb-2005  skrll Sync with HEAD.
 1.142.2.5 17-Jan-2005  skrll Sync with HEAD.
 1.142.2.4 18-Dec-2004  skrll Sync with HEAD.
 1.142.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.142.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.142.2.1 03-Aug-2004  skrll Sync with HEAD
 1.160.2.5 19-Sep-2004  he Apply patch (requested by yamt in ticket #861):
Fix this so it compiles again; we cannot use the link
set macros for pool initialization on this release branch.
 1.160.2.4 18-Sep-2004  he Pull up revision 1.173 (requested by yamt in ticket #861):
Fix ipqent pool corruption problems. Make the TCP reassembly
code use its own pool of ipqent rather than sharing it with
the IP reassembly code. Fixes PR#24782.
 1.160.2.3 22-Apr-2004  tron Pull up revision 1.164 (requested by tls in ticket #172):
Change the default state of two tunables; bring our TCP a little bit
closer to normal behaviour for the current century.
New Reno is now on by default (which is really the only reasonable
choice, since we don't do SACK); instead of an initial window of 1
for non-local nets, we now use Sally Floyd's magic 4K rule.
 1.160.2.2 20-Apr-2004  jmc Pullup patch (requested by itojun in ticket #169)

If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it. Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window. If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
Respond to RST by ACK, as suggested in NISCC recommendation.
Rate-limit ACKs against RSTs and SYNs.
If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
 1.160.2.1 06-Apr-2004  jmc Pullup rev 1.161 (requested by christos in ticket #69)

Invoking tcpcb's get erroneously free'd resulting in to_ticks <= 0 assertion.
PR#22551
 1.177.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.177.4.1 12-Feb-2005  yamt sync with head.
 1.177.2.1 29-Apr-2005  kent sync with -current
 1.187.2.3 06-May-2005  tron Pull up revision 1.190 (requested by yamt in ticket #251):
fix problems related to loopback interface checksum omission. PR/29971.
- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)
ok'ed by Jason Thorpe.
 1.187.2.2 06-May-2005  tron Pull up revision 1.189 (requested by kurahone in ticket #199):
Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.
Idea taken from FreeBSD.
 1.187.2.1 04-Apr-2005  tron Pull up revision 1.188 (requested by yamt in ticket #90):
protect tcpipqent with splvm.
 1.191.2.7 17-Mar-2008  yamt sync with head.
 1.191.2.6 11-Feb-2008  yamt sync with head.
 1.191.2.5 21-Jan-2008  yamt sync with head
 1.191.2.4 03-Sep-2007  yamt sync with head.
 1.191.2.3 26-Feb-2007  yamt sync with head.
 1.191.2.2 30-Dec-2006  yamt sync with head.
 1.191.2.1 21-Jun-2006  yamt sync with head.
 1.196.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.196.10.1 19-Apr-2006  elad sync with head.
 1.196.8.2 14-Sep-2006  yamt sync with head.
 1.196.8.1 24-May-2006  yamt sync with head.
 1.196.6.1 22-Apr-2006  simonb Sync with head.
 1.196.4.3 09-Sep-2006  rpaulo sync with head
 1.196.4.2 05-Feb-2006  rpaulo Adapt to in6pcb removal.
 1.196.4.1 05-Feb-2006  rpaulo Bye netinet6/in6_pcb.h.
 1.199.4.2 10-Dec-2006  yamt sync with head.
 1.199.4.1 22-Oct-2006  yamt sync with head
 1.199.2.2 12-Jan-2007  ad Sync with head.
 1.199.2.1 18-Nov-2006  ad Sync with head.
 1.208.4.2 03-Jun-2008  skrll Sync with netbsd-4.
 1.208.4.1 04-Jun-2007  wrstuden Update to today's netbsd-4.
 1.208.2.2 30-Mar-2008  jdc Pull up revisions:
src/sys/netinet/ip_input.c 1.263
src/sys/netinet/tcp_subr.c 1.225
(requested by cube in ticket #1109).

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.
 1.208.2.1 24-May-2007  pavel branches: 1.208.2.1.4;
Pull up following revision(s) (requested by degroote in ticket #667):
sys/netinet/tcp_input.c: revision 1.260
sys/netinet/tcp_output.c: revision 1.154
sys/netinet/tcp_subr.c: revision 1.210
sys/netinet6/icmp6.c: revision 1.129
sys/netinet6/in6_proto.c: revision 1.70
sys/netinet6/ip6_forward.c: revision 1.54
sys/netinet6/ip6_input.c: revision 1.94
sys/netinet6/ip6_output.c: revision 1.114
sys/netinet6/raw_ip6.c: revision 1.81
sys/netipsec/ipcomp_var.h: revision 1.4
sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32
sys/netipsec/ipsec6.h: revision 1.5
sys/netipsec/ipsec_input.c: revision 1.14
sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26
sys/netipsec/ipsec_output.c: revision 1.21 via patch
sys/netipsec/key.c: revision 1.33,1.44
sys/netipsec/xform_ipcomp.c: revision 1.9
sys/netipsec/xform_ipip.c: revision 1.15
sys/opencrypto/deflate.c: revision 1.8
Commit my SoC work
Add ipv6 support for fast_ipsec
Note that currently, packet with extensions headers are not correctly
supported
Change the ipcomp logic

Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar
to the sysctl kame interface.

Choose the good default policy, depending of the adress family of the
desired policy

Increase the refcount for the default ipv6 policy so nobody can reclaim it

Always compute the sp index even if we don't have any sp in spd. It will
let us to choose the right default policy (based on the adress family
requested).
While here, fix an error message

Use dynamic array instead of an static array to decompress. It lets us to
decompress any data, whatever is the radio decompressed data / compressed
data.
It fixes the last issues with fast_ipsec and ipcomp.
While here, bzero -> memset, bcopy -> memcpy, FREE -> free
Reviewed a long time ago by sam@
 1.208.2.1.4.1 30-Mar-2008  jdc Pull up revisions:
src/sys/netinet/ip_input.c 1.263
src/sys/netinet/tcp_subr.c 1.225
(requested by cube in ticket #1109).

- Make sure we send a reasonable fragment size when IPSEC is configured.
Otherwise we end up sending a dubious "0" whenever we cannot find a
proper association for the packet.
- Reset sack_newdata along with snd_nxt to avoid improper integer
arithmetics that lead to sending data from an incorrect place in the
stream, making it appear as corrupted.

Patch by Michael Van Elst, based on an analysis by Michael for the IPSEC
stuff and I for the SACK issue.
 1.210.2.4 07-May-2007  yamt sync with head.
 1.210.2.3 24-Mar-2007  yamt sync with head.
 1.210.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.210.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.212.2.5 20-Aug-2007  ad Sync with HEAD.
 1.212.2.4 15-Jul-2007  ad Sync with head.
 1.212.2.3 01-Jul-2007  ad Adapt to callout API change.
 1.212.2.2 08-Jun-2007  ad Sync with head.
 1.212.2.1 13-Mar-2007  ad Sync with head.
 1.213.2.1 11-Jul-2007  mjf Sync with head.
 1.216.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.218.20.2 02-Aug-2007  rmind TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.218.20.1 02-Aug-2007  rmind file tcp_subr.c was added on branch matt-mips64 on 2007-08-02 02:42:42 +0000
 1.218.16.2 19-Jan-2008  bouyer Sync with HEAD
 1.218.16.1 02-Jan-2008  bouyer Sync with HEAD
 1.218.12.1 26-Dec-2007  ad Sync with head.
 1.218.10.1 18-Feb-2008  mjf Sync with HEAD.
 1.218.4.2 23-Mar-2008  matt sync with HEAD
 1.218.4.1 09-Jan-2008  matt sync with HEAD
 1.222.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.222.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.222.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.222.2.1 24-Mar-2008  keiichi sync with head.
 1.227.2.1 18-May-2008  yamt sync with head.
 1.229.2.4 11-Mar-2010  yamt sync with head
 1.229.2.3 20-Jun-2009  yamt sync with head
 1.229.2.2 04-May-2009  yamt sync with head.
 1.229.2.1 16-May-2008  yamt sync with head.
 1.231.6.1 19-Oct-2008  haad Sync with HEAD.
 1.231.2.1 10-Oct-2008  skrll Sync with HEAD.
 1.233.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.233.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.234.2.2 23-Jul-2009  jym Sync with HEAD.
 1.234.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.238.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.238.4.2 31-May-2011  rmind sync with head
 1.238.4.1 21-Apr-2011  rmind sync with head
 1.242.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.242.2.2 30-Oct-2012  yamt sync with head
 1.242.2.1 17-Apr-2012  yamt sync with head
 1.243.2.2 05-Apr-2012  mrg sync to latest -current.
 1.243.2.1 18-Feb-2012  mrg merge to -current.
 1.246.2.1 31-Oct-2012  riz Pull up following revision(s) (requested by msaitoh in ticket #644):
sys/netinet/tcp_subr.c: revision 1.248
sys/kern/subr_cprng.c: revision 1.12
Fix a bug that kmem_alloc() is called from the interrupt context.
 1.248.2.3 03-Dec-2017  jdolecek update from HEAD
 1.248.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.248.2.1 23-Jun-2013  tls resync from head
 1.250.2.3 18-May-2014  rmind sync with head
 1.250.2.2 23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.250.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.255.4.2 21-Feb-2015  martin Pull up following revision(s) (requested by he in ticket #530):
sys/netinet/tcp_output.c: revision 1.180
sys/netinet/tcp_input.c: revision 1.336
sys/netinet/tcp_usrreq.c: revision 1.203
share/man/man4/tcp.4: revision 1.30
sys/netinet/tcp.h: revision 1.31
sys/netinet/tcp_subr.c: revision 1.258
sys/netinet/tcp_var.h: revision 1.176
sys/netinet/tcp_var.h: revision 1.177
sys/sys/param.h: bump revision

Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).

Change the new counter variables in struct tcpcb to uint32_t, as
per christos' comments.
 1.255.4.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.257.2.8 28-Aug-2017  skrll Sync with HEAD
 1.257.2.7 05-Feb-2017  skrll Sync with HEAD
 1.257.2.6 05-Dec-2016  skrll Sync with HEAD
 1.257.2.5 09-Jul-2016  skrll Sync with HEAD
 1.257.2.4 19-Mar-2016  skrll Sync with HEAD
 1.257.2.3 22-Sep-2015  skrll Sync with HEAD
 1.257.2.2 06-Jun-2015  skrll Sync with HEAD
 1.257.2.1 06-Apr-2015  skrll Sync with HEAD
 1.266.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.266.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.269.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.270.6.3 09-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1662):

sys/netinet/tcp_subr.c: revision 1.286
sys/netinet/tcp_timer.c: revision 1.96
sys/netinet/in_var.h: revision 1.102
sys/netinet/in_var.h: revision 1.99

Don't increment the iss sequence on each connection because it exposes
information (Amit Klein)

Add some randomness to the iss offset

Use a random IPv4 ID because the shuffling algorithm used before could expose
information (Amit Klein)

mv <sys/cprng.h> include to the kernel portion
 1.270.6.2 07-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1661):

sys/netinet6/ip6_id.c: revision 1.19-1.21
sys/netinet6/ip6_var.h: revision 1.88
sys/netinet/ip_input.c: revision 1.400
sys/netinet/tcp_subr.c: revision 1.285
sys/netinet/ip6.h: revision 1.30

netinet: Enable random IP fragment ids by default (from riastradh)

netinet: Enable RFC 1948 pseudorandom TCP ISS selection by default.
(from riastradh)

netinet6: Mark randomid unused.

Will make merging and bisection easier if anything goes wrong with
flow label or fragment id randomization changes.
(from riastradh)

netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)

Replace randomid() by cprng_fast32()
 1.270.6.1 03-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #514):
sys/net/route.c: 1.205
sys/net/rtsock.c: 1.237-1.238
sys/netinet/in.c: 1.215
sys/netinet/tcp_subr.c: 1.272
sys/netinet/tcp_timer.c: 1.93
sys/netinet/tcp_timer.h: 1.29
sys/netinet/tcp_var.h: 1.182
sys/netinet6/in6.c: 1.258
Remove extra pserialize_perform from in_purgeaddr
It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too).
Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr
The deadlock happened only if NET_MPSAFE on.
Run tcp_slowtimo in workqueue if NET_MPSAFE
If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.
NFCI for !NET_MPSAFE
Fix a return value of rt_update_prepare
Callers expect it to be an errno.
Fix another deadlock
When waiting for a route update to finish, a waiter has to release its reference
to the route to avoid a deadlock. Because a updater tries to wait for references
to a target route (except for a reference by the updater itself) to be released.
 1.273.2.6 18-Jan-2019  pgoyette Synch with HEAD
 1.273.2.5 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.273.2.4 25-Jun-2018  pgoyette Sync with HEAD
 1.273.2.3 21-May-2018  pgoyette Sync with HEAD
 1.273.2.2 22-Apr-2018  pgoyette Sync with HEAD
 1.273.2.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.280.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.280.2.1 10-Jun-2019  christos Sync with HEAD
 1.282.4.3 09-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1229):

sys/netinet/tcp_subr.c: revision 1.286
sys/netinet/tcp_timer.c: revision 1.96
sys/netinet/in_var.h: revision 1.102
sys/netinet/in_var.h: revision 1.99

Don't increment the iss sequence on each connection because it exposes
information (Amit Klein)

Add some randomness to the iss offset

Use a random IPv4 ID because the shuffling algorithm used before could expose
information (Amit Klein)

mv <sys/cprng.h> include to the kernel portion
 1.282.4.2 07-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1226):

sys/netinet6/ip6_id.c: revision 1.19-1.21
sys/netinet6/ip6_var.h: revision 1.88
sys/netinet/ip_input.c: revision 1.400
sys/netinet/tcp_subr.c: revision 1.285
sys/netinet/ip6.h: revision 1.30

netinet: Enable random IP fragment ids by default (from riastradh)

netinet: Enable RFC 1948 pseudorandom TCP ISS selection by default.
(from riastradh)

netinet6: Mark randomid unused.

Will make merging and bisection easier if anything goes wrong with
flow label or fragment id randomization changes.
(from riastradh)

netinet/netinet6: Add necessary includes to make these standalone.
(from riastradh)

Replace randomid() by cprng_fast32()
 1.282.4.1 10-Sep-2019  martin Pull up following revision(s) (requested by maxv in ticket #193):

sys/netinet/tcp_timer.h: revision 1.30
sys/netinet/tcp_input.c: revision 1.415
sys/netinet/tcp_usrreq.c: revision 1.225
sys/netinet/tcp_subr.c: revision 1.283

Clamp tcp timer quantities to reasonable ranges.
 1.284.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.288.4.1 01-Aug-2021  thorpej Sync with HEAD.
 1.296.8.1 02-Aug-2025  perseant Sync with HEAD
 1.7 29-Jun-2024  riastradh netinet: Use _NET_STAT* API instead of direct array access.

PR kern/58380
 1.6 04-Nov-2022  ozaki-r inpcb: rename functions to in6pcb_*
 1.5 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.4 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.3 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.2 20-Sep-2022  ozaki-r syncache: make some functions static
 1.1 20-Sep-2022  ozaki-r tcp: separate syn cache stuffs into tcp_syncache.[ch] files

No functional change.
 1.2 20-Sep-2022  ozaki-r syncache: make some functions static
 1.1 20-Sep-2022  ozaki-r tcp: separate syn cache stuffs into tcp_syncache.[ch] files

No functional change.
 1.99 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.98 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.97 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.96 08-Mar-2021  christos Add some randomness to the iss offset
 1.95 03-May-2018  maxv branches: 1.95.6; 1.95.14;
Remove now unused tcpip.h includes. Some were already unused before.
 1.94 29-Mar-2018  maxv Remove #ifdef INET. Same as tcp_input.c. Makes the code easier to
understand.

Also make tcp6_mtudisc() static in tcp_subr.c.
 1.93 19-Jan-2018  ozaki-r branches: 1.93.2;
Run tcp_slowtimo in workqueue if NET_MPSAFE

If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.

NFCI for !NET_MPSAFE
 1.92 28-Jul-2017  maxv Remove TCP_COMPAT_42. This feature is a workaround for a bug in the TCP
stack of BSD4.2. Having such features just does not make any sense, and
looking at the code, I'm not sure it actually works.
 1.91 25-Jul-2016  knakahara branches: 1.91.8;
fix: unlock in reverse order
 1.90 26-Apr-2016  ozaki-r branches: 1.90.2;
Sweep unnecessary route.h inclusions
 1.89 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.88 10-Nov-2014  maxv branches: 1.88.2;
Do not uselessly include <sys/malloc.h>.
 1.87 02-Jan-2014  pooka branches: 1.87.4;
Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
 1.86 31-Aug-2011  plunky branches: 1.86.2; 1.86.12; 1.86.16;
NULL does not need a cast
 1.85 20-Apr-2011  gdt Rewrite comments about TCP RTO calculations.

Long ago, the storage representations of srtt and rttvar were changed
from the 4.4BSD scheme, and the comments are out of sync with the
code. This commit rewrites most of the comments that explain the RTO
calculations, and points out some issues in the code.

Joint work with Bev Schwartz of BBN (original analysis and comments),
but I have rewritten and extended them, so errors are mine.

This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073. Approved for Public
Release, Distribution Unlimited
 1.84 10-Nov-2008  uebayasi branches: 1.84.8; 1.84.10;
Whitespace.
 1.83 09-Nov-2008  bouyer Fix kern/39769: race condition in TCP timers
When a TCP timer is disarmed (with callout_stop()) in the general case
callout_invoking() isn't checked, so the timer handler could be called run
when the current interrupt handler exits, athough the timer is disarmed.
This case cause bad things like TCPT_REXMT and TCPT_PERSIST being both pending,
causing a panic (see the PR for details).
Close the issue by aborting the handler if the timer is not callout_expired().
(the EXPIRED flag being cleared by callout_stop()).
 1.82 10-Oct-2008  ad branches: 1.82.2; 1.82.4;
tcp_delack: test for TF_DELACK.
 1.81 28-Apr-2008  martin branches: 1.81.2; 1.81.6;
Remove clause 3 and 4 from TNF licenses
 1.80 24-Apr-2008  ad branches: 1.80.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.79 12-Apr-2008  thorpej branches: 1.79.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.78 08-Apr-2008  thorpej Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.
 1.77 20-Jun-2007  christos branches: 1.77.28;
- per socket keepalive settings
- settable connection establishment timeout
 1.76 09-Oct-2006  rpaulo branches: 1.76.2; 1.76.8; 1.76.10;
Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
 1.75 14-May-2006  elad branches: 1.75.8; 1.75.10;
integrate kauth.
 1.74 15-Apr-2006  christos Coverity CID 1153: Add KASSERT before deref.
 1.73 11-Dec-2005  christos branches: 1.73.4; 1.73.6; 1.73.8; 1.73.10; 1.73.12;
merge ktrace-lwp.
 1.72 19-Jul-2005  christos Implement PMTU checks from:

http://www.gont.com.ar/drafts/icmp-attacks-against-tcp.html

1. Don't act on ICMP-need-frag immediately if adhoc checks on the
advertised MTU fail. The MTU update is delayed until a TCP retransmit
happens.
2. Ignore ICMP Source Quench messages meant for TCP connections.

From OpenBSD.
 1.71 02-Mar-2005  mycroft branches: 1.71.2; 1.71.4;
Copyright maintenance.
 1.70 28-Feb-2005  jonathan Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over. Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting both
Kentaro's wired-Ethernet NIC and in my two (different) WiFi NICs.
 1.69 03-Feb-2005  perry ANSIfy function declarations
 1.68 27-Jan-2005  mycroft Whoops. Exit fast recovery when handling a timeout.
 1.67 26-Jan-2005  mycroft Fix two problems in our TCP stack:

1) If an echoed RFC 1323 time stamp appears to be later than the current time,
ignore it and fall back to old-style RTT calculation. This prevents ending
up with a negative RTT and panicking later.

2) Fix NewReno. This involves a few changes:

a) Implement the send_high variable in RFC 2582. Our implementation is
subtly different; it is one *past* the last sequence number transmitted
rather than being equal to it. This simplifies some logic and makes
the code smaller. Additional logic was required to prevent sequence
number wraparound problems; this is not mentioned in RFC 2582.

b) Make sure we reset t_dupacks on new acks, but *not* on a partial ack.
All of the new ack code is pushed out into tcp_newreno(). (Later this
will probably be a pluggable function.) Thus t_dupacks keeps track of
whether we're in fast recovery all the time, with Reno or NewReno, which
keeps some logic simpler.

c) We do not need to update snd_recover when we're not in fast recovery.
See tech-net for an explanation of this.

d) In the gratuitous fast retransmit prevention case, do not send a packet.
RFC 2582 specifically says that we should "do nothing".

e) Do not inflate the congestion window on a partial ack. (This is done by
testing t_dupacks to see whether we're still in fast recovery.)

This brings the performance of NewReno back up to the same as Reno in a few
random test cases (e.g. transferring peer-to-peer over my wireless network).
I have not concocted a good test case for the behavior specific to NewReno.
 1.66 02-Jan-2004  itojun branches: 1.66.8; 1.66.10;
some corrections from markus@openbsd;
- callout_ack() was called with wrong argument
 1.65 27-Oct-2003  itojun make it compilable with TCP_DEBUG defined
 1.64 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.63 20-Jul-2003  he As a temporary workaround, apply the fix from PR#20390, thereby
cooperating with the callout code in working around the race
condition caused by the TCP code's use of the callout facility.

Instead of unconditionally releasing memory in tcp_close() and
SYN_CACHE_PUT(), check whether any of the related callout handlers
are about to be invoked (but have not yet done callout_ack()), and
if so, just mark the associated data structure (tcpcb or syn cache
entry) as "dead", and test for this (and release storage) in the
callout handler functions.
 1.62 03-Feb-2003  thorpej branches: 1.62.2;
Test callout_pending(), not callout_active(), and eliminate now-unnecessary
callout_deactivate() calls.
 1.61 24-Nov-2002  scw Quell an uninitialised variable warning.
 1.60 22-Oct-2002  simonb Guard use of "so" in tcp_timer_persist() and tcp_timer_2msl() with
#ifdef TCP_DEBUG.
 1.59 09-Jun-2002  itojun whitespace
 1.58 26-May-2002  itojun path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
 1.57 13-Nov-2001  lukem branches: 1.57.8; 1.57.10;
add RCSIDs
 1.56 04-Nov-2001  matt Change a few variable/tables to const since they are read-only.
 1.55 11-Sep-2001  thorpej branches: 1.55.2;
Use callouts for SYN cache timers, rather than traversing time queues
in tcp_slowtimo().
 1.54 10-Sep-2001  thorpej Update copyrights.
 1.53 10-Sep-2001  thorpej Use callouts for TCP timers, rather than traversing the list of
all open TCP connections in tcp_slowtimo() (which is called 2x
per second). It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
 1.52 10-Sep-2001  thorpej Initialize TCP timer variables in a new function, tcp_timer_init().
 1.51 10-Sep-2001  thorpej Split tcp_timers() into multiple functions, one for each timer,
and call it directly from tcp_slowtimo() (via a table) rather
than going through tcp_userreq().

This will allow us to call TCP timers directly from callouts,
in a future revision.
 1.50 10-Sep-2001  thorpej Change the way receive idle time and round trip time are measured.
Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now
take a timstamp (via tcp_now) and use subtraction to compute the
delta when we actually need it (using unsigned arithmetic so that
tcp_now wrapping is handled correctly).

Based on similar changes in FreeBSD.
 1.49 10-Sep-2001  thorpej Use a callout for the delayed ACK timer, and delete tcp_fasttimo().
Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
 1.48 19-Oct-2000  itojun branches: 1.48.2; 1.48.4; 1.48.6;
remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
 1.47 17-Oct-2000  itojun be more friendly with INET-less build.
XXX we need to do more to do a working INET-less build
 1.46 30-Mar-2000  augustss Remove register declarations.
 1.45 14-Jul-1999  itojun branches: 1.45.2;
Use proper ip protocol # field and tcp hdr on sending RST against SYN,
when ip header and tcp header are not adjacent to each other
(i.e. when ip6 options are attached).

To test this, try
telnet @::1@::1 port
toward a port without responding server. Prior to the fix, the kernel will
generate broken RST packet.
 1.44 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.43 10-Sep-1998  mouse branches: 1.43.8; 1.43.10;
Create tcp.keepidle, tcp.keepintvl, tcp.keepcnt, tcp.slowhz sysctls.
 1.42 04-Sep-1998  mycroft Fix a couple of bogons related to tcp_new_iss():
* Don't add tcp_iss_seq when creating a new ISS from TIME-WAIT state.
* Do the clock increment even when using the rnd device.
 1.41 17-Jul-1998  thorpej Comment where we use the Loss Window.
 1.40 02-Jun-1998  thorpej Loss window MUST be one segment, per draft-floyd-incr-init-win-03.
 1.39 11-May-1998  thorpej Make sure a timer is marked "disarmed" once it has expired.
 1.38 11-May-1998  thorpej Nuke TUBA per my note to tech-net; there's no reason to keep it around.
 1.37 07-May-1998  thorpej Define all TCP timers in terms of PRT timers.
 1.36 06-May-1998  thorpej Use macros from tcp_timer.h to manipulate TCP timers, so that their
implementation can be changed easily.
 1.35 01-May-1998  kml Remove bogus black hole discovery code
 1.34 29-Apr-1998  thorpej Make use of the work-arounds for ancient broken TCP peers run-time
conditional (tcp_compat_42). The kernel config option TCP_COMPAT_42
will still enable this by default, or disable this by default if the
option is not included (i.e. current behavior). This will be made a
sysctl soon.
 1.33 29-Apr-1998  kml Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.
 1.32 31-Mar-1998  thorpej Fix a potential-congestion case in the larger initial congestion window
code, as clarified in the TCPIMPL WG meeting at IETF #41: If the SYN
(active open) or SYN,ACK (passive open) was retransmitted, the initial
congestion window for the first slow start of that connection must be
one segment.
 1.31 19-Mar-1998  kml Fix a retransmission bug introduced by the Brakmo and Peterson
RTO estimation changes. Under some circumstances it would return a value
of 0, while the old Van Jacobson RTO code would return a minimum of 3.
This would result in 12 retransmissions, each 1 second apart.
This takes care of those instances, and ensures that t_rttmin is
used everywhere as a lower bound.
 1.30 19-Feb-1998  thorpej Update copyright (sigh, should have done this long ago).
 1.29 12-Jan-1998  scottr Use option header file for TCP_COMPAT_42
 1.28 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.27 31-Dec-1997  thorpej Implement a queue for delayed ACK processing. This queue is used in
tcp_fasttimo() in lieu of scanning all open TCP connections.
 1.26 17-Dec-1997  thorpej From 4.4BSD-Lite2:
- When running the slow timers, skip PCBs in LISTEN state.
- When processing the persist timer, drop the connection if the connection
idle time exceeds the maximum backoff for retransmit. Part of
kern/2335 (pete@daemon.net).
 1.25 11-Dec-1997  thorpej Implement an infrastructure to allow larger initial congestion windows.
The sysctl'able variable "tcp_init_win", when set to 0, selects an
auto-tuning algorithm for selecting the initial window, based on transmit
segment size, per discussion in the IETF tcpimpl working group.

Default initial window is still 1 segment, but will soon become 2 segments,
per discussion in tcpimpl.
 1.24 11-Dec-1997  thorpej In tcp_fasttimo(), don't clear TF_DELACK; we need it to count delayed ACKs
in tcp_output(), and it will only be cleared in tcp_output() if the ACK was
transmitted sucessfully. Also, don't count delayed ACKs here, let tcp_output()
count them.
 1.23 09-Dec-1997  thorpej Costmetic change: use intotcpcb() in tcp_fasttimo().
 1.22 08-Nov-1997  kml TCP MSS fixes to provide cleaner slow-start and recovery.
 1.21 13-Oct-1997  explorer branches: 1.21.2;
o Make usage of /dev/random dependant on
pseudo-device rnd # /dev/random and in-kernel generator
in config files.

o Add declaration to all architectures.

o Clean up copyright message in rnd.c, rnd.h, and rndpool.c to include
that this code is derived in part from Ted Tyso's linux code.
 1.20 10-Oct-1997  explorer Add hooks to use the kernel random system to generate TCP sequence numbers.
 1.19 28-Jul-1997  thorpej branches: 1.19.2;
Garbage-collect some "extern"s.
 1.18 23-Jul-1997  thorpej Pull SYN_cache_branch down into the main line.
 1.17 10-Dec-1996  mycroft branches: 1.17.8;
Fix RTT scaling problems introduced with Brakmo and Peterson changes.
 1.16 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.15 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.14 13-Feb-1996  christos branches: 1.14.4;
netinet prototypes
 1.13 12-Aug-1995  mycroft splnet --> splsoftnet
 1.12 18-Jun-1995  cgd convert pcb lists to CIRCLEQs, so that the end can be looked at more
easily, and so that the original (insque/remque) logic can be effectively
mimiced. (This fixes a bug in the previous set of list changes.)
also (since terminator is no longer null) reinstate uninitted list checks,
but mark them XXX.
 1.11 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.10 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.9 14-Oct-1994  mycroft Don't return received data to the user until the initial handshake is complete.
Also use TCPS_HAVEESTABLISHED() in a few other places.
 1.8 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.7 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.6 08-Jan-1994  mycroft Prototypes.
 1.5 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.4 18-Dec-1993  mycroft Canonicalize all #includes.
 1.3 22-May-1993  cgd branches: 1.3.4;
add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.14.4.2 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.14.4.1 10-Dec-1996  mycroft From trunk:
Fix RTT scaling problems introduced with Brakmo and Peterson changes.
 1.17.8.2 28-Jun-1997  thorpej KNF.
 1.17.8.1 14-May-1997  mellon More of David Borman's SYN cache patches for Lite2:

tcp_slowtimo:

- call syn_cache_timer()
 1.19.2.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.21.2.3 05-May-1998  mycroft Pull up 1.31, per request of kml.
 1.21.2.2 29-Jan-1998  mellon Pull up 1.23-1.28 (thorpej)
 1.21.2.1 08-Nov-1997  thorpej Pull up from trunk: TCP MSS fixes to provide cleaner slow-start and recovery.
(kml)
 1.43.10.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.43.10.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.43.10.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.43.8.2 02-Aug-1999  thorpej Update from trunk.
 1.43.8.1 01-Jul-1999  thorpej Sync w/ -current.
 1.45.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.48.6.1 01-Oct-2001  fvdl Catch up with -current.
 1.48.4.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.48.4.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.48.4.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.48.2.5 11-Dec-2002  thorpej Sync with HEAD.
 1.48.2.4 11-Nov-2002  nathanw Catch up to -current
 1.48.2.3 20-Jun-2002  nathanw Catch up to -current.
 1.48.2.2 14-Nov-2001  nathanw Catch up to -current.
 1.48.2.1 21-Sep-2001  nathanw Catch up to -current.
 1.55.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.57.10.3 15-Mar-2004  jmc Pullup rev 1.66 (requested by he in ticket #1624)

callout_ack() was called with wrong argument
 1.57.10.2 22-Oct-2003  jmc Pullup rev 1.63 (requested by he in ticket #1530)


Introduce a new INVOKING status for callouts, and use it to close
a race condition in the TCP code. Fixes PR#20390.
 1.57.10.1 05-Sep-2003  tron Pull up revision 1.58 (requested by tls in ticket #1445):
path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
 1.57.8.2 20-Jun-2002  gehenna catch up with -current.
 1.57.8.1 30-May-2002  gehenna Catch up with -current.
 1.62.2.6 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.62.2.5 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.62.2.4 04-Feb-2005  skrll Sync with HEAD.
 1.62.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.62.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.62.2.1 03-Aug-2004  skrll Sync with HEAD
 1.66.10.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.66.10.1 12-Feb-2005  yamt sync with head.
 1.66.8.1 29-Apr-2005  kent sync with -current
 1.71.4.3 03-Sep-2007  yamt sync with head.
 1.71.4.2 30-Dec-2006  yamt sync with head.
 1.71.4.1 21-Jun-2006  yamt sync with head.
 1.71.2.1 18-Nov-2008  snj Pull up following revision(s) (requested by bouyer in ticket #1981):
sys/netinet/tcp_timer.c: revision 1.83 via patch
Fix kern/39769: race condition in TCP timers
When a TCP timer is disarmed (with callout_stop()) in the general case
callout_invoking() isn't checked, so the timer handler could be called run
when the current interrupt handler exits, athough the timer is disarmed.
This case cause bad things like TCPT_REXMT and TCPT_PERSIST being both
pending, causing a panic (see the PR for details).
Close the issue by aborting the handler if the timer is not
callout_expired(). (the EXPIRED flag being cleared by callout_stop()).
 1.73.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.73.10.1 19-Apr-2006  elad sync with head.
 1.73.8.1 24-May-2006  yamt sync with head.
 1.73.6.1 22-Apr-2006  simonb Sync with head.
 1.73.4.2 09-Sep-2006  rpaulo sync with head
 1.73.4.1 05-Feb-2006  rpaulo <netinet6/in6_pcb.h> went away. Bye!
 1.75.10.1 22-Oct-2006  yamt sync with head
 1.75.8.1 18-Nov-2006  ad Sync with head.
 1.76.10.1 11-Jul-2007  mjf Sync with head.
 1.76.8.1 15-Jul-2007  ad Sync with head.
 1.76.2.1 18-Nov-2008  snj Pull up following revision(s) (requested by bouyer in ticket #1234):
sys/netinet/tcp_timer.c: revision 1.83 via patch
Fix kern/39769: race condition in TCP timers
When a TCP timer is disarmed (with callout_stop()) in the general case
callout_invoking() isn't checked, so the timer handler could be called run
when the current interrupt handler exits, athough the timer is disarmed.
This case cause bad things like TCPT_REXMT and TCPT_PERSIST being both
pending, causing a panic (see the PR for details).
Close the issue by aborting the handler if the timer is not
callout_expired(). (the EXPIRED flag being cleared by callout_stop()).
 1.77.28.2 17-Jan-2009  mjf Sync with HEAD.
 1.77.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.79.2.1 18-May-2008  yamt sync with head.
 1.80.2.2 04-May-2009  yamt sync with head.
 1.80.2.1 16-May-2008  yamt sync with head.
 1.81.6.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.81.6.1 19-Oct-2008  haad Sync with HEAD.
 1.81.2.1 10-Oct-2008  skrll Sync with HEAD.
 1.82.4.1 14-Nov-2008  snj Pull up following revision(s) (requested by bouyer in ticket #56):
sys/netinet/tcp_timer.c: revision 1.83
Fix kern/39769: race condition in TCP timers
When a TCP timer is disarmed (with callout_stop()) in the general case
callout_invoking() isn't checked, so the timer handler could be called run
when the current interrupt handler exits, athough the timer is disarmed.
This case cause bad things like TCPT_REXMT and TCPT_PERSIST being both pending,
causing a panic (see the PR for details).
Close the issue by aborting the handler if the timer is not callout_expired().
(the EXPIRED flag being cleared by callout_stop()).
 1.82.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.84.10.1 06-Jun-2011  jruoho Sync with HEAD.
 1.84.8.1 21-Apr-2011  rmind sync with head
 1.86.16.2 18-May-2014  rmind sync with head
 1.86.16.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.86.12.2 03-Dec-2017  jdolecek update from HEAD
 1.86.12.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.86.2.1 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.87.4.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.88.2.4 28-Aug-2017  skrll Sync with HEAD
 1.88.2.3 05-Oct-2016  skrll Sync with HEAD
 1.88.2.2 29-May-2016  skrll Sync with HEAD
 1.88.2.1 22-Sep-2015  skrll Sync with HEAD
 1.90.2.1 26-Jul-2016  pgoyette Sync with HEAD
 1.91.8.2 09-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1662):

sys/netinet/tcp_subr.c: revision 1.286
sys/netinet/tcp_timer.c: revision 1.96
sys/netinet/in_var.h: revision 1.102
sys/netinet/in_var.h: revision 1.99

Don't increment the iss sequence on each connection because it exposes
information (Amit Klein)

Add some randomness to the iss offset

Use a random IPv4 ID because the shuffling algorithm used before could expose
information (Amit Klein)

mv <sys/cprng.h> include to the kernel portion
 1.91.8.1 03-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #514):
sys/net/route.c: 1.205
sys/net/rtsock.c: 1.237-1.238
sys/netinet/in.c: 1.215
sys/netinet/tcp_subr.c: 1.272
sys/netinet/tcp_timer.c: 1.93
sys/netinet/tcp_timer.h: 1.29
sys/netinet/tcp_var.h: 1.182
sys/netinet6/in6.c: 1.258
Remove extra pserialize_perform from in_purgeaddr
It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too).
Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr
The deadlock happened only if NET_MPSAFE on.
Run tcp_slowtimo in workqueue if NET_MPSAFE
If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.
NFCI for !NET_MPSAFE
Fix a return value of rt_update_prepare
Callers expect it to be an errno.
Fix another deadlock
When waiting for a route update to finish, a waiter has to release its reference
to the route to avoid a deadlock. Because a updater tries to wait for references
to a target route (except for a reference by the updater itself) to be released.
 1.93.2.2 21-May-2018  pgoyette Sync with HEAD
 1.93.2.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.95.14.1 03-Apr-2021  thorpej Sync with HEAD.
 1.95.6.1 09-Mar-2021  martin Pull up following revision(s) (requested by christos in ticket #1229):

sys/netinet/tcp_subr.c: revision 1.286
sys/netinet/tcp_timer.c: revision 1.96
sys/netinet/in_var.h: revision 1.102
sys/netinet/in_var.h: revision 1.99

Don't increment the iss sequence on each connection because it exposes
information (Amit Klein)

Add some randomness to the iss offset

Use a random IPv4 ID because the shuffling algorithm used before could expose
information (Amit Klein)

mv <sys/cprng.h> include to the kernel portion
 1.30 06-Aug-2019  riastradh Clamp tcp timer quantities to reasonable ranges.

Reported-by: syzbot+259675123340bf46a6de@syzkaller.appspotmail.com
 1.29 19-Jan-2018  ozaki-r branches: 1.29.4; 1.29.8;
Run tcp_slowtimo in workqueue if NET_MPSAFE

If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.

NFCI for !NET_MPSAFE
 1.28 24-May-2011  gdt branches: 1.28.48;
Note units and current value for TCP_DELACK_TICKS.
 1.27 20-Apr-2011  gdt Rewrite comments about TCP RTO calculations.

Long ago, the storage representations of srtt and rttvar were changed
from the 4.4BSD scheme, and the comments are out of sync with the
code. This commit rewrites most of the comments that explain the RTO
calculations, and points out some issues in the code.

Joint work with Bev Schwartz of BBN (original analysis and comments),
but I have rewritten and extended them, so errors are mine.

This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073. Approved for Public
Release, Distribution Unlimited
 1.26 28-Apr-2008  martin branches: 1.26.22; 1.26.28;
Remove clause 3 and 4 from TNF licenses
 1.25 20-Jun-2007  christos branches: 1.25.28; 1.25.30; 1.25.32;
- per socket keepalive settings
- settable connection establishment timeout
 1.24 26-Sep-2006  jeremy branches: 1.24.8; 1.24.10;
Fixed a bug in the timeout range constraint macro that can cause a timeout
to break free of the constraint if the range minimum boundary is larger than
the maximum boundary.

Discovered by jmg@FreeBSD.org. (See FreeBSD's tcp_timer.h rev 1.31).
 1.23 10-Dec-2005  elad branches: 1.23.20; 1.23.22;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.22 02-Jun-2005  riz branches: 1.22.2;
Fix some const fallout.
 1.21 04-Mar-2005  mycroft Re-add callout_active(), in a way compatible with the FreeBSD version, and use
it in the TCP stack to test which of the REXMT or PERSIST timer is in use.
This fixes a race condition that could cause "panic: tcp_output REXMT". See
tech-net for details.
 1.20 07-Aug-2003  agc branches: 1.20.8; 1.20.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.19 04-Feb-2003  thorpej branches: 1.19.2;
Use callout_setfunc() and callout_schedule().
 1.18 03-Feb-2003  thorpej Test callout_pending(), not callout_active(), and eliminate now-unnecessary
callout_deactivate() calls.
 1.17 04-Nov-2001  matt Change a few variable/tables to const since they are read-only.
 1.16 10-Sep-2001  thorpej branches: 1.16.2;
Update copyrights.
 1.15 10-Sep-2001  thorpej Use callouts for TCP timers, rather than traversing the list of
all open TCP connections in tcp_slowtimo() (which is called 2x
per second). It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
 1.14 10-Sep-2001  thorpej Initialize TCP timer variables in a new function, tcp_timer_init().
 1.13 10-Sep-2001  thorpej Add explicit initialization of TCP timer state. A noop right now.
 1.12 10-Sep-2001  thorpej Split tcp_timers() into multiple functions, one for each timer,
and call it directly from tcp_slowtimo() (via a table) rather
than going through tcp_userreq().

This will allow us to call TCP timers directly from callouts,
in a future revision.
 1.11 10-Sep-2001  thorpej Use a callout for the delayed ACK timer, and delete tcp_fasttimo().
Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
 1.10 10-Sep-1998  mouse branches: 1.10.24; 1.10.26; 1.10.28;
Create tcp.keepidle, tcp.keepintvl, tcp.keepcnt, tcp.slowhz sysctls.
 1.9 07-May-1998  thorpej Define all TCP timers in terms of PRT timers.
 1.8 06-May-1998  thorpej Use the monotonically increasing slow timer timestamp provided by
the protocol dispatch layer for TCP timers. This saves having to
modify a potentially large number of timer values (which were shorts,
and expanded to ... a lot of code on the Alpha).
 1.7 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.6 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.5 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.10.28.1 01-Oct-2001  fvdl Catch up with -current.
 1.10.26.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.26.1 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.10.24.2 14-Nov-2001  nathanw Catch up to -current.
 1.10.24.1 21-Sep-2001  nathanw Catch up to -current.
 1.16.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.19.2.6 11-Dec-2005  christos Sync with head.
 1.19.2.5 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.19.2.4 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.19.2.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.19.2.2 18-Sep-2004  skrll Sync with HEAD.
 1.19.2.1 03-Aug-2004  skrll Sync with HEAD
 1.20.10.1 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.20.8.1 29-Apr-2005  kent sync with -current
 1.22.2.3 03-Sep-2007  yamt sync with head.
 1.22.2.2 30-Dec-2006  yamt sync with head.
 1.22.2.1 21-Jun-2006  yamt sync with head.
 1.23.22.1 22-Oct-2006  yamt sync with head
 1.23.20.1 18-Nov-2006  ad Sync with head.
 1.24.10.1 11-Jul-2007  mjf Sync with head.
 1.24.8.1 15-Jul-2007  ad Sync with head.
 1.25.32.1 16-May-2008  yamt sync with head.
 1.25.30.1 18-May-2008  yamt sync with head.
 1.25.28.1 02-Jun-2008  mjf Sync with HEAD.
 1.26.28.1 06-Jun-2011  jruoho Sync with HEAD.
 1.26.22.2 31-May-2011  rmind sync with head
 1.26.22.1 21-Apr-2011  rmind sync with head
 1.28.48.1 03-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #514):
sys/net/route.c: 1.205
sys/net/rtsock.c: 1.237-1.238
sys/netinet/in.c: 1.215
sys/netinet/tcp_subr.c: 1.272
sys/netinet/tcp_timer.c: 1.93
sys/netinet/tcp_timer.h: 1.29
sys/netinet/tcp_var.h: 1.182
sys/netinet6/in6.c: 1.258
Remove extra pserialize_perform from in_purgeaddr
It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too).
Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr
The deadlock happened only if NET_MPSAFE on.
Run tcp_slowtimo in workqueue if NET_MPSAFE
If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.
NFCI for !NET_MPSAFE
Fix a return value of rt_update_prepare
Callers expect it to be an errno.
Fix another deadlock
When waiting for a route update to finish, a waiter has to release its reference
to the route to avoid a deadlock. Because a updater tries to wait for references
to a target route (except for a reference by the updater itself) to be released.
 1.29.8.1 10-Sep-2019  martin Pull up following revision(s) (requested by maxv in ticket #193):

sys/netinet/tcp_timer.h: revision 1.30
sys/netinet/tcp_input.c: revision 1.415
sys/netinet/tcp_usrreq.c: revision 1.225
sys/netinet/tcp_subr.c: revision 1.283

Clamp tcp timer quantities to reasonable ranges.
 1.29.4.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.238 04-Nov-2022  ozaki-r inpcb: rename functions to in6pcb_*
 1.237 04-Nov-2022  ozaki-r inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.236 30-Oct-2022  ozaki-r tcp: restore NULL check for inp in tcp_ctloutput
 1.235 29-Oct-2022  ozaki-r tcp: restore NULL checks for inp
 1.234 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.233 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.232 20-Sep-2022  ozaki-r tcp: separate syn cache stuffs into tcp_syncache.[ch] files

No functional change.
 1.231 28-Jun-2022  riastradh tcp(4): Bail early on sendoob if not connected.

XXX Not sure if testing tp->t_template is the right way to discern
this -- I just reached for it because the downstream crash is a panic
on tp->t_template == NULL in tcp_output.

XXX In principle this could try connecting to the address, except
it's not passed down from the logic in uipc_socket.c to tcp_sendoob.

Reported-by: syzbot+a01f4cfec72790855ce2@syzkaller.appspotmail.com
 1.230 04-Aug-2021  christos Get the value of the right variable (from RVP)
 1.229 08-Mar-2021  christos Remove the unused "addin" argument (it was always 0) and go back using
a random iss by default (instead of rfc1948)
 1.228 23-Nov-2020  chs Restore correct functioning of SIOCATMARK by removing the previous
change that was done to fix poll(POLLPRI | POLLRDBAND) and instead
add a separate flag to track when poll() should indicate that a
MSG_OOB byte is available. Re-fixes PR 54435 properly.
 1.227 17-Oct-2020  mlelstv branches: 1.227.2;
Fix RTT values reported by TCP_INFO.
 1.226 13-Apr-2020  maxv hardclock_ticks -> getticks()
 1.225 06-Aug-2019  riastradh branches: 1.225.6;
Clamp tcp timer quantities to reasonable ranges.

Reported-by: syzbot+259675123340bf46a6de@syzkaller.appspotmail.com
 1.224 05-Feb-2019  mrg branches: 1.224.4;
adjust fallthru comments to appease gcc7.
 1.223 28-Jan-2019  martin Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.222 16-Dec-2018  christos sbspace() does not return negative values anymore and that broke OOB data
sending. Instead of depending on negative values, account for the 1024
bytes sosend() adds so that it can use all the space here in a separate
function sbspace_oob(). Idea from mlelstv@
 1.221 24-Nov-2018  maxv KNF, no functional change.
 1.220 24-Nov-2018  maxv Fix kernel pointer leaks in sysctl_inpcblist.
 1.219 03-May-2018  maxv branches: 1.219.2;
Remove now unused tcpip.h includes. Some were already unused before.
 1.218 07-Apr-2018  maxv Remove dead code.
 1.217 29-Mar-2018  maxv Remove #ifdef INET. Same as tcp_input.c. Makes the code easier to
understand.

Also make tcp6_mtudisc() static in tcp_subr.c.
 1.216 15-Aug-2017  christos branches: 1.216.2;
add some more getsockopt(2) params
 1.215 28-Jul-2017  maxv Remove TCP_COMPAT_42. This feature is a workaround for a bug in the TCP
stack of BSD4.2. Having such features just does not make any sense, and
looking at the code, I'm not sure it actually works.
 1.214 24-Jan-2017  ozaki-r branches: 1.214.6;
Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.213 18-Nov-2016  knakahara branches: 1.213.2;
fix: "ifconfig destory" can stalls when "ifconfig" is done parallel.
This problem occurs only if NET_MPSAFE on.

ifconfig destroy side:
kernel entry point is ifioctl => if_clone_destroy.
pr_purgeif() acquires softnet_lock, and then ifa_remove() calls
pserialize_perform() holding softnet_lock.
ifconfig side:
kernel entry point is socreate.
pr_attach()(udp_attach_wrapper()) calls sosetlock(). In this call path,
sosetlock() try to acquire softnet_lock.
These can cause dead lock.
 1.212 26-Apr-2016  ozaki-r branches: 1.212.2;
Sweep unnecessary route.h inclusions
 1.211 15-Feb-2016  rtr Reduce code duplication.

Split creation of IPv4-Mapped IPv6 addresses into its own function
and use it.

No functional change intended. As posted to tech-net@
 1.210 14-Feb-2016  rtr remove duplicated #include of <netinet/in.h>
 1.209 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.208 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.207 26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.206 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.205 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.204 31-Mar-2015  ozaki-r Remove unnecessary opt_ipsec.h inclusions
 1.203 14-Feb-2015  he Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).
 1.202 10-Nov-2014  maxv branches: 1.202.2;
Do not uselessly include <sys/malloc.h>.
 1.201 18-Oct-2014  snj src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.200 09-Aug-2014  rtr branches: 1.200.2;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.199 08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.198 05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.197 05-Aug-2014  rtr get_tcppcb() is nearly always called upon entry to usrreqs so
KASSERT(solocked(so)) inside it and remove the redundant KASSERT
everywhere we are using tcp_getpcb()
 1.196 05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.195 02-Aug-2014  rtr restore splsoftnet() in various usrreqs that were removed during the PRU
splits. we will properly review removal after the PRU split work is
complete.
 1.194 31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.193 30-Jul-2014  rtr split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind
 1.192 30-Jul-2014  rtr put boilerplate extraction of inpcb or in6pcb and tcpcb performed in tcp
usrreqs into a function that can be called instead of cut & pasting it
to every single usrreq function.

tcp_getpcb(struct socket *, struct inpcb **, struct in6pcb **, struct tcpcb **)

* examines the family of the provided socket and fills in either inpcb
or in6pcb and tcpcb.
* if the pcb is not present for the family of the socket EINVAL is
returned, if the family is not AF_INET{,6} EAFNOSUPPORT is returned.

signature provided by and patch reviewed by rmind
 1.191 24-Jul-2014  rtr cleanup after last commit

- add KASSERT(req != PRU_BIND) and KASSERT(req != PRU_LISTEN) inside
tcp_usrreq() as these reqs should no longer reach here.
- remove (now unreachable) PRU_LISTEN case in switch.
 1.190 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.189 23-Jul-2014  rtr split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind
 1.188 14-Jul-2014  rtr pr_generic() for req = PRU_RCVOOB is always called with control == NULL
so don't bother with a conditional block that handles non-NULL, it
doesn't happen.
 1.187 10-Jul-2014  rmind tcp_accept: simplify a little.
 1.186 09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.185 09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.184 07-Jul-2014  rtr * sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.
 1.183 07-Jul-2014  rtr backout change that made pr_stat return EOPNOTSUPP for protocols that
were not filling in struct stat.

decision made after further discussion with rmind and investigation of
how other operating systems behave. soo_stat() is doing just enough to
be able to call what gets returned valid and thus justifys a return of
success.

additional review will be done to determine of the pr_stat functions
that were already returning EOPNOTSUPP can be considered successful with
what soo_stat() is doing.
 1.182 07-Jul-2014  rtr * have pr_stat return EOPNOTSUPP consistently for all protocols that do
not fill in struct stat instead of returning success.

* in pr_stat remove all checks for non-NULL so->so_pcb except where the
pcb is actually used (i.e. cases where we don't return EOPNOTSUPP).

proposed on tech-net@
 1.181 06-Jul-2014  rtr * split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind
 1.180 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.179 23-Jun-2014  rtr where appropriate rename xxx_ioctl() struct mbuf * parameters from
`control' to `ifp' after split from xxx_usrreq().

sys_socket.c
fix wrapping of arguments to be consistent with other function calls
in the file after replacing pr_usrreq() call with pr_ioctl() which
required one less argument.

link_proto.c
fix indentation of parameters in link_ioctl() prototype to be
consistent with the rest of the file.

discussed with rmind@
 1.178 22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.177 22-May-2014  rmind tcp_usrreq: fix the previous - the assert is still not true (but PRU_SENSE
case will handle it); eventually, pr_usrreq should not be called without
PCB attached.
 1.176 21-May-2014  rmind tcp_usrreq: fix the previous correctly - restore the assert logic,
but move it after the PRU_SENSE check.
 1.175 21-May-2014  pgoyette Restore original sense of the check, and allow both inp and in6p to be
NULL. This case is explicitly handled below.
 1.174 20-May-2014  rmind Adjust PR_WRAP_USRREQS() to include the attach/detach functions.
We still need the kernel-lock for some corner cases.
 1.173 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.172 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.171 25-Feb-2014  pooka branches: 1.171.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.170 02-Dec-2013  kefren Update TCP CB with new values on rfc1323 and mssdflt sysctl updates
From yasuoka@iij.ad.jp in kern/44254
 1.169 23-Nov-2013  christos convert from CIRCLEQ to TAILQ.
 1.168 04-Oct-2013  christos PR/48098: Brian Marcotte: Avoid kernel assertion for embryonic sockets that
don't have credentials yet.
XXX: pullup-6
 1.167 15-Sep-2013  martin Remove unused variables
 1.166 10-Apr-2013  christos branches: 1.166.4;
Limit the tcp initial window setting to 10, leaving it by default to 4
and simplifying the code in process. Per draft-ietf-initcwnd-08.txt.
 1.165 02-Jun-2012  dsl branches: 1.165.2;
Add some pre-processor magic to verify that the type of the data item
passed to sysctl_createv() actually matches the declared type for
the item itself.
In the places where the caller specifies a function and a structure
address (typically the 'softc') an explicit (void *) cast is now needed.
Fixes bugs in sys/dev/acpi/asus_acpi.c sys/dev/bluetooth/bcsp.c
sys/kern/vfs_bio.c sys/miscfs/syncfs/sync_subr.c and setting
AcpiGbl_EnableAmlDebugObject.
(mostly passing the address of a uint64_t when typed as CTLTYPE_INT).
I've test built quite a few kernels, but there may be some unfixed MD
fallout. Most likely passing &char[] to char *.
Also add CTLFLAG_UNSIGNED for unsiged decimals - not set yet.
 1.164 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.163 17-Mar-2012  christos PR/46077: M. Nunberg: Stat should not fial on connecting socket.
 1.162 02-Feb-2012  tls branches: 1.162.2;
Entropy-pool implementation move and cleanup.

1) Move core entropy-pool code and source/sink/sample management code
to sys/kern from sys/dev.

2) Remove use of NRND as test for presence of entropy-pool code throughout
source tree.

3) Remove use of RND_ENABLED in device drivers as microoptimization to
avoid expensive operations on disabled entropy sources; make the
rnd_add calls do this directly so all callers benefit.

4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might
have lead to slight entropy overestimation for some sources.

5) Add new source types for environmental sensors, power sensors, VM
system events, and skew between clocks, with a sample implementation
for each.

ok releng to go in before the branch due to the difficulty of later
pullup (widespread #ifdef removal and moved files). Tested with release
builds on amd64 and evbarm and live testing on amd64.
 1.161 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.160 06-Jun-2011  dyoung branches: 1.160.2; 1.160.6;
Don't allocate resources for vtw until/unless it is enabled. This will
further help those machines where memory is in short supply.

TBD: release resources after vtw is disabled and all entries have
expired.
 1.159 03-May-2011  dyoung branches: 1.159.2;
Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.158 30-Dec-2009  elad branches: 1.158.4; 1.158.6;
Get the uid from the socket's credentials.
 1.157 16-Sep-2009  pooka Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.156 09-Sep-2009  darran Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl.
Okayed by tls@.
 1.155 07-Jun-2009  rmind sysctl_inpcblist: fix a lock leak in error path (hi <matt>).
 1.154 17-Apr-2009  elad Pass the lwp argument to in6_pcbbind() for the INET6 PRU_LISTEN and
PRU_CONNECT cases of tcp_usrreq(). It seems they were forgotten a long
time ago.

Similar code in FreeBSD and OpenBSD passes the thread (credentials)/proc.
 1.153 15-Apr-2009  elad Remove a few KAUTH_GENERIC_ISSUSER in favor of more descriptive
alternatives.

Discussed on tech-kern:

http://mail-index.netbsd.org/tech-kern/2009/04/11/msg004798.html

Input from ad@, christos@, dyoung@, tsutsui@.

Okay ad@.
 1.152 11-Mar-2009  mrg like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.
 1.151 18-Feb-2009  yamt sysctl_net_inet_ip_ports: fix ipv6 sysctls.
 1.150 06-Nov-2008  dyoung branches: 1.150.4;
Cosmetic: change (type *)0 to NULL.
 1.149 11-Oct-2008  pooka branches: 1.149.2; 1.149.4; 1.149.6;
Move uidinfo to its own module in kern_uidinfo.c and include in rump.
No functional change to uidinfo.
 1.148 20-Aug-2008  matt Make the sysctl routines take out softnet_lock before dealing with
any data structures.

Change inet6ctlerrmap and zeroin6_addr to const.
 1.147 06-Aug-2008  plunky Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.146 04-May-2008  thorpej branches: 1.146.2; 1.146.6;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.145 28-Apr-2008  martin Remove clause 3 and 4 from TNF licenses
 1.144 24-Apr-2008  ad branches: 1.144.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.143 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.142 12-Apr-2008  thorpej branches: 1.142.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.141 08-Apr-2008  thorpej Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.
 1.140 16-Dec-2007  elad branches: 1.140.6;
Really fix low port allocation, by always passing a valid lwp to
in_pcbbind().

Okay dyoung@.

Note that the network code is another candidate for major cleanup... also
note that this issue is likely to be present in netinet6 code, too.
 1.139 27-Nov-2007  christos branches: 1.139.2; 1.139.6;
require that the options argument is the right size, not that it is greater
or equal to the requested size. Suggested by Matt Thomas.
 1.138 04-Nov-2007  rmind branches: 1.138.2;
Pick the smallest possible TCP window scaling factor that will still allow
us to scale up to sb_max. This might fix the problems with some firewalls.

Taken from FreeBSD (silby).
OK by <dyoung>.
 1.137 19-Sep-2007  dyoung branches: 1.137.4;
1) Introduce a new socket option, (SOL_SOCKET, SO_NOHEADER), that
tells a socket that it should both add a protocol header to tx'd
datagrams and remove the header from rx'd datagrams:

int onoff = 1, s = socket(...);
setsockopt(s, SOL_SOCKET, SO_NOHEADER, &onoff);

2) Add an implementation of (SOL_SOCKET, SO_NOHEADER) for raw IPv4
sockets.

3) Reorganize the protocols' pr_ctloutput implementations a bit.
Consistently return ENOPROTOOPT when an option is unsupported,
and EINVAL if a supported option's arguments are incorrect.
Reorganize the flow of code so that it's more clear how/when
options are passed down the stack until they are handled.

Shorten some pr_ctloutput staircases for readability.

4) Extract common mbuf code into subroutines, add new sockaddr
methods, and introduce a new subroutine, fsocreate(), for reuse
later; use it first in sys_socket():

struct mbuf *m_getsombuf(struct socket *so)

Create an mbuf and make its owner the socket `so'.

struct mbuf *m_intopt(struct socket *so, int val)

Create an mbuf, make its owner the socket `so', put the
int `val' into it, and set its length to sizeof(int).


int fsocreate(..., int *fd)

Create a socket, a la socreate(9), put the socket into the
given LWP's descriptor table, return the descriptor at `fd'
on success.

void *sockaddr_addr(struct sockaddr *sa, socklen_t *slenp)
const void *sockaddr_const_addr(const struct sockaddr *sa, socklen_t *slenp)

Extract a pointer to the address part of a sockaddr. Write
the length of the address part at `slenp', if `slenp' is
not NULL.

socklen_t sockaddr_getlen(const struct sockaddr *sa)

Return the length of a sockaddr. This just evaluates to
sa->sa_len. I only add this for consistency with code that
appears in a portable userland library that I am going to
import.

const struct sockaddr *sockaddr_any(const struct sockaddr *sa)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.

const void *sockaddr_anyaddr(const struct sockaddr *sa, socklen_t *slenp)

Return the "don't care" sockaddr in the same family as
`sa'. This is the address a client should sobind(9) if it
does not care the source address and, if applicable, the
port et cetera that it uses.
 1.136 02-Aug-2007  rmind branches: 1.136.2; 1.136.4; 1.136.6;
TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.135 28-Jun-2007  christos branches: 1.135.2;
Handle mapped and scoped ipv6 addresses. From Anon Ymous.
 1.134 26-Jun-2007  xtraeme Protect inet6_ident_core() with #ifdef INET6, fixes building without
options INET6.
 1.133 25-Jun-2007  christos tcpdrop kernel bits (from anon ymous)
 1.132 20-Jun-2007  christos - per socket keepalive settings
- settable connection establishment timeout
 1.131 04-Mar-2007  christos branches: 1.131.2; 1.131.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.130 06-Dec-2006  yamt branches: 1.130.2;
add some more tcp mowners.
 1.129 10-Nov-2006  yamt branches: 1.129.2; 1.129.4; 1.129.8;
tcp_ctloutput: when called for a socket which is not AF_INET or AF_INET6,
panic rather than returning possibly leaking an mbuf.
 1.128 19-Oct-2006  rpaulo Use a better way to create sysctl subtrees for ECN and Congctl.
Inspired on ABC subtree.
 1.127 19-Oct-2006  yamt implement RFC3465 appropriate byte counting.
from Kentaro A. Kurahone, with minor adjustments by me.
the ack prediction part of the original patch was omitted because
it's a separate change. reviewed by Rui Paulo.
 1.126 16-Oct-2006  rpaulo Export the tcp_do_rfc1948 variable to userland via sysctl.
The code to generate an ISS via an MD5 hash has been present in the
NetBSD kernel since 2001, but it wasn't even exported to userland at
that time. It was agreed on tech-net with the original author <thorpej>
that we should let the user decide if he wants to enable it or not.
Not enabled by default.
 1.125 13-Oct-2006  elad Introduce KAUTH_REQ_NETWORK_SOCKET_CANSEE. Since we're not gonna be having
credentials on sockets, at least not anytime soon, this is a way to check
if we can "look" at a socket. Later on when (and if) we do have socket
credentials, the interface usage remains the same because we pass the
socket.

This also fixes sysctl for inet/inet6 pcblist.
 1.124 09-Oct-2006  rpaulo Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
 1.123 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.122 13-Sep-2006  elad branches: 1.122.2;
Don't use KAUTH_RESULT_* where it's not applicable.
Prompted by yamt@.
 1.121 08-Sep-2006  elad First take at security model abstraction.

- Add a few scopes to the kernel: system, network, and machdep.

- Add a few more actions/sub-actions (requests), and start using them as
opposed to the KAUTH_GENERIC_ISSUSER place-holders.

- Introduce a basic set of listeners that implement our "traditional"
security model, called "bsd44". This is the default (and only) model we
have at the moment.

- Update all relevant documentation.

- Add some code and docs to help folks who want to actually use this stuff:

* There's a sample overlay model, sitting on-top of "bsd44", for
fast experimenting with tweaking just a subset of an existing model.

This is pretty cool because it's *really* straightforward to do stuff
you had to use ugly hacks for until now...

* And of course, documentation describing how to do the above for quick
reference, including code samples.

All of these changes were tested for regressions using a Python-based
testsuite that will be (I hope) available soon via pkgsrc. Information
about the tests, and how to write new ones, can be found on:

http://kauth.linbsd.org/kauthwiki

NOTE FOR DEVELOPERS: *PLEASE* don't add any code that does any of the
following:

- Uses a KAUTH_GENERIC_ISSUSER kauth(9) request,
- Checks 'securelevel' directly,
- Checks a uid/gid directly.

(or if you feel you have to, contact me first)

This is still work in progress; It's far from being done, but now it'll
be a lot easier.

Relevant mailing list threads:

http://mail-index.netbsd.org/tech-security/2006/01/25/0011.html
http://mail-index.netbsd.org/tech-security/2006/03/24/0001.html
http://mail-index.netbsd.org/tech-security/2006/04/18/0000.html
http://mail-index.netbsd.org/tech-security/2006/05/15/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/01/0000.html
http://mail-index.netbsd.org/tech-security/2006/08/25/0000.html

Many thanks to YAMAMOTO Takashi, Matt Thomas, and Christos Zoulas for help
stablizing kauth(9).

Full credit for the regression tests, making sure these changes didn't break
anything, goes to Matt Fleming and Jaime Fournier.

Happy birthday Randi! :)
 1.120 05-Sep-2006  rpaulo branches: 1.120.2;
Import of TCP ECN algorithm for congestion control.
Both available for IPv4 and IPv6.
Basic implementation test results are available at
http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.

Work sponsored by the Google Summer of Code project 2006.
Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their
help, comments and support during the project.
 1.119 23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.118 16-Jul-2006  elad get rid of CURTAIN() macro. inline the last use of it, together
with a nice XXX comment (assigned to me of course) that we should
be doing this differently.
 1.117 14-May-2006  elad integrate kauth.
 1.116 15-Apr-2006  christos Move pf2 assignment after we've assigned pf.
 1.115 14-Apr-2006  christos Coverity CID 1154: Prevent NULL deref.
 1.114 14-Apr-2006  christos Coverity CID 738: Fix the query size vs. result returning setup.
 1.113 11-Dec-2005  christos branches: 1.113.4; 1.113.6; 1.113.8; 1.113.10; 1.113.12;
merge ktrace-lwp.
 1.112 15-Nov-2005  dsl Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
 1.111 07-Sep-2005  elad branches: 1.111.6;
Implement curtain for AF_INET{,6} PCB lists.
 1.110 06-Sep-2005  rpaulo Correct SYSCTL_DESCR for tcp.debx.
 1.109 06-Sep-2005  rpaulo Implement tcp.inet{,6}.tcp{,6}.(debug|debx) when TCP_DEBUG is set. They
can be used to ``transliterate protocol trace'' like trpt(8) does.
 1.108 10-Aug-2005  yamt move {tcp,udp}_do_loopback_cksum back to tcp/udp
so that they can be referenced by ipv6.
 1.107 05-Aug-2005  elad Add sysctls for IP, ICMP, TCP, and UDP statistics.
 1.106 20-Jun-2005  atatat branches: 1.106.2;
Change the rest of the sysctl subsystem to use const consistently.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
 1.105 09-Jun-2005  atatat Properly fix the constipated lossage wrt -Wcast-qual and the sysctl
code. I know it's not the prettiest code, but it seems to work rather
well in spite of itself.
 1.104 29-May-2005  christos - add const
- remove bogus casts
- avoid nested variables
 1.103 07-May-2005  christos PR/30154: YAMAMOTO Takashi: tcp_close locking botch
chgsbsize() as mentioned in the PR can be called from an interrupt context
via tcp_close(). Avoid calling uid_find() in chgsbsize().
- Instead of storing so_uid in struct socketvar, store *so_uidinfo
- Add a simple lock to struct uidinfo.
 1.102 05-Apr-2005  kurahone Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.

Idea taken from FreeBSD.
 1.101 30-Mar-2005  yamt s of sack is selective, not selection. pointed by Michael Eriksson.
 1.100 11-Mar-2005  atatat branches: 1.100.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.
 1.99 10-Mar-2005  atatat Make this build without INET6 xor INET (hah!) again.
 1.98 10-Mar-2005  atatat Change types of kern.file2 and net.*.*.pcblist to NODE
 1.97 09-Mar-2005  atatat Add the following nodes to the sysctl tree:

net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist

which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
 1.96 06-Mar-2005  yamt update SYSCTL_DESCR; sack is implemented.
 1.95 02-Mar-2005  mycroft Copyright maintenance.
 1.94 03-Feb-2005  perry ANSIfy function declarations
 1.93 15-Dec-2004  thorpej branches: 1.93.2; 1.93.4;
Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.92 25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.91 18-May-2004  itojun fix MD5 signature support to actually validate inbound signature, and
drop packet if fails.
 1.90 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.89 20-Apr-2004  matt export tcpstates for _KERNEL and remove tcp_usrreq.c's incorrect
declartion.
 1.88 29-Mar-2004  atatat Make these compile without INET. tcp_input probably needs a lot more
work...
 1.87 24-Mar-2004  atatat branches: 1.87.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.86 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.85 22-Oct-2003  thorpej Rather than zeroing a tcpcb structure and filling in all the fields
individually, create a tcpcb template pre-initialized (and pre-zero'd)
with the static and mostly-static tcpcb parameters. The template is
now copied into the new tcpcb, which zeros and initializes most of the
tcpcb in one pass. The template is kept up-to-date as TCP sysctl
variables are changed.

Combined with the previous sb_max change, TCP socket creation is now
25% faster.
 1.84 29-Sep-2003  tls Increase default socket-buffer sizes from 16K to 32K. This increases
throughput significantly in a wide variety of test cases, including
local gigabit ethernet with both jumbo and standard frames,
transcontinental (U.S.) connections with e2e bandwidths ranging from
10Mbit/sec to 155Mbit/sec, and on a variety of test connections
between the NetBSD Project public servers and machines in Australia.

The impact of this change is less dramatic for high-delay connections
when Path MTU is in use but still measurable.

For optimal performance on local gigabit networks, a higher socket
buffer size (at least 64K) will still yield a substantial improvement
in performance, but 32K gets us most of the way there in my test
cases, with only a cost of _doubling_ memory use per socket rather
than _quadrupling_ it.

N.B. Windows NT, at least since Win2k SP2, uses a default socket buffer
size (or their analogue thereof) of 64K, which is a useful data
point.
 1.83 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.82 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.81 29-Jun-2003  fvdl branches: 1.81.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.80 29-Jun-2003  simonb Fix a nit in a comment.
 1.79 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.78 26-Jun-2003  christos abuse the mib instead of abusing the new pointer. Idea from simon burge.
It allows the tcp_sysctl_ident to run by non-super-users. No backwards
compatibility provided.
 1.77 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.76 19-Apr-2003  christos PR/2352: Tor Egge: Add sysctl to get uid of connected socket.
 1.75 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.74 22-Oct-2002  simonb Guard use of "ostate" with #ifdef TCP_DEBUG in tcp_usrreq().
Don't put semicolons at the end of "#define token value".
 1.73 03-Jul-2002  thorpej Rename sbappend_stream() to sbappendstream(), per suggestion from
Jonathan Stone.
 1.72 03-Jul-2002  thorpej Make insertion of data into socket buffers O(C):
* Keep pointers to the first and last mbufs of the last record in the
socket buffer.
* Use the sb_lastrecord pointer in the sbappend*() family of functions
to avoid traversing the packet chain to find the last record.
* Add a new sbappend_stream() function for stream protocols which
guarantee that there will never be more than one record in the
socket buffer. This function uses the sb_mbtail pointer to perform
the data insertion. Make TCP use sbappend_stream().

On a profiling run, this makes sbappend of a TCP transmission using
a 1M socket buffer go from 50% of the time to .02% of the time.

Thanks to Bill Sommerfeld and YAMAMOTO Takashi for their debugging
assistance!
 1.71 09-Jun-2002  itojun whitespace
 1.70 11-Mar-2002  martin branches: 1.70.4;
KNFify my last change.
 1.69 28-Feb-2002  martin Enforce a lower bound of 32 for tcp_mssdflt.

This avoids kernel crashes when we don't handle nonsensial values
like 0 gracefully. Better check here once beforehand than having to
check for non meaningful values in time critical paths (like tcp_output).

Fixes PR 15709.
 1.68 20-Nov-2001  lukem - replace "defopt" with "defparam" for options which must take a value,
as config(8) will warn for value-less defparam options
- minor whitespace/formatting cleanup
- consolidate opt_tcp_recvspace.h and opt_tcp_sendspace.h into opt_tcp_space.h
 1.67 13-Nov-2001  lukem add RCSIDs
 1.66 29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.65 10-Sep-2001  thorpej branches: 1.65.2;
Split tcp_timers() into multiple functions, one for each timer,
and call it directly from tcp_slowtimo() (via a table) rather
than going through tcp_userreq().

This will allow us to call TCP timers directly from callouts,
in a future revision.
 1.64 25-Jul-2001  itojun branches: 1.64.2;
allocate ipsec policy buffer attached to pcb in in*_pcballoc, before
giving anyone accesses to pcb (do not reveal an inconsistent ones).
sync with kame
 1.63 08-Jul-2001  abs branches: 1.63.2;
Rename TCPDEBUG to TCP_DEBUG, defopt TCP_DEBUG and TCP_NDEBUG, and
make all usage of tcp_trace dependent on TCP_DEBUG - resulting in
a 31K saving on an INET enabled i386 kernel.
 1.62 03-Jul-2001  itojun call in{,6}_pcbpurgeif0() before in{,6}_purgeif().
 1.61 20-Mar-2001  thorpej Two changes, designed to make us even more resilient against TCP
ISS attacks (which we already fend off quite well).

1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic
hash method of generating TCP ISS values. Note, this code is experimental
and disabled by default (experimental enough that I don't export the
variable via sysctl yet, either). There are a couple of issues I'd
like to discuss with Steve, so this code should only be used by people
who really know what they're doing.

2. Per a recent thread on Bugtraq, it's possible to determine a system's
uptime by snooping the RFC1323 TCP timestamp options sent by a host; in
4.4BSD, timestamps are created by incrementing the tcp_now variable
at 2 Hz; there's even a company out there that uses this to determine
web server uptime. According to Newsham's paper "The Problem With
Random Increments", while NetBSD's TCP ISS generation method is much
better than the "random increment" method used by FreeBSD and OpenBSD,
it is still theoretically possible to mount an attack against NetBSD's
method if the attacker knows how many times the tcp_iss_seq variable
has been incremented. By not leaking uptime information, we can make
that much harder to determine. So, we avoid the leak by giving each
TCP connection a timebase of 0.
 1.60 11-Feb-2001  itojun branches: 1.60.2;
make sure we call tcp_output() only if we have template.
 1.59 18-Jan-2001  jdolecek constify
 1.58 11-Dec-2000  itojun make sure t_family has the correct protocol family, after connect(2)
and/or bind(2). sync with kame
 1.57 17-Oct-2000  itojun allow INET6-less build.
From: smd@ebone.net (Sean Doran)
 1.56 17-Oct-2000  itojun be more friendly with INET-less build.
XXX we need to do more to do a working INET-less build
 1.55 06-Oct-2000  enami Cosmetic changes to previous commit; indent break statement sanely.
 1.54 06-Oct-2000  enami Just call matching purgeif/pcbpurgeif routine for the protocol family.
Without this, if a v6 address is placed before a v4 address in if_addrlist,
a PRU_PURGEIF request for v6 tcp protocol purges also v4 addresses and,
as a result, if_detach fails to request PRU_PURGEIF for v4 protocols
other than tcp.
 1.53 28-Jul-2000  itojun nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit
 1.52 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.51 05-Jun-2000  itojun branches: 1.51.2;
pass struct proc * down to udp6_output and in6_pcbbind.
 1.50 22-May-2000  itojun branches: 1.50.2;
disallow negative numbers for ratelimit interval (tcp, icmp, icmp6).
 1.49 30-Mar-2000  augustss Remove register declarations.
 1.48 15-Feb-2000  thorpej Add support for rate-limiting RSTs sent in response to no socket for
an incoming packet. Default minimum interval is 10ms. The interval
is changeable via the "net.inet.tcp.rstratelimit" sysctl variable.
 1.47 04-Feb-2000  itojun avoid calling in6_control(SIOCDIFADDR_IN6) from interrupt context.
it is not supposed to work.
logging fix: add "\n" to some of log() in in6_prefix.c.

improve in6_ifdetach(). now almost all structure depend on ifnet
will be cleared up.
possible loose ends:
- cached route_in6 in static varaiables needs to be cleared as well
- there are ifaddr manipulation without reference counting,
which should be fixed
we still see panics after card removal, though... not sure what is left.

(sync with kame)
 1.46 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.45 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.44 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.43 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.42 09-Jul-1999  thorpej branches: 1.42.2; 1.42.8;
defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.41 02-Jul-1999  itojun avoid "variable not initialized" warnings on some of the platforms.
 1.40 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.39 10-Sep-1998  tv branches: 1.39.8; 1.39.10;
egcs {brace} warning fix
 1.38 10-Sep-1998  mouse Create tcp.keepidle, tcp.keepintvl, tcp.keepcnt, tcp.slowhz sysctls.
 1.37 06-May-1998  thorpej Use macros from tcp_timer.h to manipulate TCP timers, so that their
implementation can be changed easily.
 1.36 29-Apr-1998  matt New TCP reassembly code. The new code reduces the memory needed by
out-of-order packets and builds the infrastructure needed for sending
SACK blocks (to be added shortly).
 1.35 13-Apr-1998  kml Fix to ensure that the correct MSS is advertised for loopback
TCP connections by using the MTU of the interface. Also added
a knob, mss_ifmtu, to force all connections to use the MTU of
the interface to calculate the advertised MSS.
 1.34 19-Feb-1998  thorpej Update copyright (sigh, should have done this long ago).
 1.33 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.32 05-Jan-1998  thorpej From 4.4BSD-Lite2 (noted by Frank van der Linden):
so_linger is used as an argument to tsleep(), so was stuffed with
clockticks for the TCP linger time. However, so_linger is set directly from
l_linger if the linger time is specified, and l_linger is seconds (although
this is not currently documented anywhere). Fix this to set the TCP
linger time in seconds, and multiply so_linger by hz when tsleep() is
called to actually perform the linger.
 1.31 13-Dec-1997  thorpej After further examination of traces of bulk transfers (with help from
Kevin Lahey), undo the "defer window update until next delayed ACK".
 1.30 11-Dec-1997  thorpej Implement an infrastructure to allow larger initial congestion windows.
The sysctl'able variable "tcp_init_win", when set to 0, selects an
auto-tuning algorithm for selecting the initial window, based on transmit
segment size, per discussion in the IETF tcpimpl working group.

Default initial window is still 1 segment, but will soon become 2 segments,
per discussion in tcpimpl.
 1.29 11-Dec-1997  thorpej In the PRU_RCVD entry point, if TF_DELACK is set, don't send the window
update now, since it will be sent within 200ms when the delayed ACK is
sent. Instrument how many hits we get on this optimization.
 1.28 08-Nov-1997  kml TCP MSS fixes to provide cleaner slow-start and recovery.
 1.27 10-Oct-1997  explorer branches: 1.27.2;
Add hooks to use the kernel random system to generate TCP sequence numbers.
 1.26 28-Jul-1997  thorpej branches: 1.26.2;
Generate dependencies for the TCP_SENDSPACE and TCP_RECVSPACE options.
 1.25 28-Jul-1997  thorpej Make the following tunable via sysctl, inspired by BSD/OS:
- tcp_sendspace
- tcp_recvspace
- tcp_mssdflt
- tcp_syn_cache_limit
- tcp_syn_bucket_limit
- tcp_syn_cache_timer
 1.24 12-Jun-1997  kleink Eliminate a superflouus `if' statement: when detaching the TCP protocol from
a socket, just calling tcp_disconnect() on the tcpcb will do the right thing.
From Thorsten Frueauf <frueauf@ira.uka.de> and W. Richard Stevens in PR/3738
resp. TCP/IP Illustrated, Vol. 2.
 1.23 23-May-1996  mycroft Make sure the control mbufs are freed in all cases.
 1.22 23-May-1996  mycroft Minor changes.
 1.21 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.20 13-Feb-1996  christos branches: 1.20.4;
netinet prototypes
 1.19 31-Jan-1996  mycroft Add a comment describing the previous.
 1.18 31-Jan-1996  mycroft If we close from FIN_WAIT_2 state, make sure we don't leave the socket
around forever if we don't get a final FIN. From Arne Juul, PR 1659.
 1.17 30-Sep-1995  thorpej branches: 1.17.2;
Implement tcp_sysctl(). Add a sysctl option to enable/disable RFC1323
extensions to TCP. From John Kohl <jtk@kolvir.blrc.ma.us>.
 1.16 12-Aug-1995  mycroft splnet --> splsoftnet
 1.15 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.14 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.13 21-Mar-1995  glass Default linger time was 120 clock ticks instead of the intended
2 minutes.
[Bug pointed out by Wright/Stevens in TCP/IP Illustrated Vol II]
 1.12 14-Oct-1994  mycroft Don't return received data to the user until the initial handshake is complete.
Also use TCPS_HAVEESTABLISHED() in a few other places.
 1.11 13-Oct-1994  mycroft Increase the default window size to 16k.
 1.10 29-Jun-1994  cgd branches: 1.10.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8 10-Jan-1994  mycroft Change the counters to be all the same type -- u_long.
 1.7 08-Jan-1994  mycroft Remove some extra prototypes.
 1.6 08-Jan-1994  mycroft Prototypes.
 1.5 18-Dec-1993  mycroft Canonicalize all #includes.
 1.4 15-Jun-1993  cgd branches: 1.4.4;
bump sendspace and recvspace up to 8k each; rod says
these should be safe values...
 1.3 22-May-1993  cgd add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.4.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.10.2.1 13-Oct-1994  mycroft Update from trunk.
 1.17.2.1 02-Feb-1996  mycroft Bring in changes for mondo patch 2.
 1.20.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.26.2.1 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.27.2.2 09-May-1998  mycroft Pull up patch from kml.
 1.27.2.1 08-Nov-1997  thorpej Pull up from trunk: TCP MSS fixes to provide cleaner slow-start and recovery.
(kml)
 1.39.10.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.39.10.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.39.10.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.39.8.2 02-Aug-1999  thorpej Update from trunk.
 1.39.8.1 01-Jul-1999  thorpej Sync w/ -current.
 1.42.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.42.2.4 27-Mar-2001  bouyer Sync with HEAD.
 1.42.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.42.2.2 13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.42.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.50.2.1 22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.51.2.2 09-Oct-2000  enami Pullup rev. 1.54 and 1.55 (approved by jhawk):
 1.51.2.1 16-Aug-2000  itojun pullup (approved by releng-1-5)

switch from net.inet*.*.*ratelimit to net.inet*.*.ppslimit.

(tags are rough estimate - we had some try-and-error in main trunc)
sys/netinet/icmp6.h 1.9 -> 1.11
sys/netinet/icmp_var.h 1.15 -> 1.17
sys/netinet/in_proto.c 1.39 -> 1.42
sys/netinet/ip_icmp.c 1.50 -> 1.51, 1.52 -> 1.54
sys/netinet/tcp_input.c 1.111 -> 1.112, 1.115 -> 1.117
sys/netinet/tcp_usrreq.c 1.52 -> 1.53
sys/netinet/tcp_var.h 1.72 -> 1.75
sys/netinet6/icmp6.c 1.34 -> 1.35, 1.36 -> 1.38
sys/netinet6/in6_proto.c 1.17 -> 1.19
 1.60.2.9 11-Nov-2002  nathanw Catch up to -current
 1.60.2.8 01-Aug-2002  nathanw Catch up to -current.
 1.60.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.60.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.60.2.5 08-Jan-2002  nathanw Catch up to -current.
 1.60.2.4 14-Nov-2001  nathanw Catch up to -current.
 1.60.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.60.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.60.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.63.2.6 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.63.2.5 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.63.2.4 16-Mar-2002  jdolecek Catch up with -current.
 1.63.2.3 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.63.2.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.63.2.1 03-Aug-2001  lukem update to -current
 1.64.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.65.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.70.4.2 15-Jul-2002  gehenna catch up with -current.
 1.70.4.1 20-Jun-2002  gehenna catch up with -current.
 1.81.2.11 11-Dec-2005  christos Sync with head.
 1.81.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.81.2.9 01-Apr-2005  skrll Sync with HEAD.
 1.81.2.8 08-Mar-2005  skrll Sync with HEAD.
 1.81.2.7 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.81.2.6 04-Feb-2005  skrll Sync with HEAD.
 1.81.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.81.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.81.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.81.2.2 03-Aug-2004  skrll Sync with HEAD
 1.81.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.87.2.1 28-May-2004  tron Pull up revision 1.92 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.93.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.93.4.1 12-Feb-2005  yamt sync with head.
 1.93.2.1 29-Apr-2005  kent sync with -current
 1.100.2.2 06-May-2005  tron Pull up revision 1.102 (requested by kurahone in ticket #199):
Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.
Idea taken from FreeBSD.
 1.100.2.1 01-May-2005  tron Pull up revision 1.101 (requested by yamt in ticket #221):
s of sack is selective, not selection. pointed by Michael Eriksson.
 1.106.2.7 21-Jan-2008  yamt sync with head
 1.106.2.6 07-Dec-2007  yamt sync with head
 1.106.2.5 15-Nov-2007  yamt sync with head.
 1.106.2.4 27-Oct-2007  yamt sync with head.
 1.106.2.3 03-Sep-2007  yamt sync with head.
 1.106.2.2 30-Dec-2006  yamt sync with head.
 1.106.2.1 21-Jun-2006  yamt sync with head.
 1.111.6.1 22-Nov-2005  yamt sync with head.
 1.113.12.1 24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.113.10.3 06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.113.10.2 19-Apr-2006  elad sync with head.
 1.113.10.1 08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.113.8.3 14-Sep-2006  yamt sync with head.
 1.113.8.2 11-Aug-2006  yamt sync with head
 1.113.8.1 24-May-2006  yamt sync with head.
 1.113.6.2 01-Jun-2006  kardel Sync with head.
 1.113.6.1 22-Apr-2006  simonb Sync with head.
 1.113.4.3 09-Sep-2006  rpaulo sync with head
 1.113.4.2 05-Feb-2006  rpaulo in6pcb -> inpcb merge.
 1.113.4.1 05-Feb-2006  rpaulo <netinet6/in6_pcb.h> went away. Bye!
 1.120.2.2 12-Jan-2007  ad Sync with head.
 1.120.2.1 18-Nov-2006  ad Sync with head.
 1.122.2.2 10-Dec-2006  yamt sync with head.
 1.122.2.1 22-Oct-2006  yamt sync with head
 1.129.8.1 05-Sep-2009  bouyer Pull up following revision(s) (requested by mlelstv in ticket #1358):
sys/netinet/tcp_usrreq.c: revision 1.148 via patch
Make the sysctl routines take raise to splnet() before dealing with
any data structures.
 1.129.4.1 03-Jun-2008  skrll Sync with netbsd-4.
 1.129.2.3 05-Sep-2009  bouyer Pull up following revision(s) (requested by mlelstv in ticket #1358):
sys/netinet/tcp_usrreq.c: revision 1.148 via patch
Make the sysctl routines take raise to splnet() before dealing with
any data structures.
 1.129.2.2 29-Jan-2008  pavel Pull up following revision(s) (requested by joerg in ticket #1057):
sys/netinet/tcp_usrreq.c: revision 1.134
Protect inet6_ident_core() with #ifdef INET6, fixes building without
options INET6.
 1.129.2.1 21-Jan-2008  bouyer Pull up following revision(s) (requested by ghen in ticket #1039):
sys/netinet/tcp_var.h: revision 1.148
distrib/sets/lists/comp/mi: revision 1.1035
distrib/sets/lists/man/mi: revision 1.1010
usr.sbin/tcpdrop/Makefile: revision 1.1
usr.sbin/tcpdrop/tcpdrop.c: revision 1.1 - 1.3
usr.sbin/tcpdrop/tcpdrop.8: revision 1.1
usr.sbin/Makefile: revision 1.228 via patch
sys/netinet/tcp_usrreq.c: revision 1.133
distrib/sets/lists/base/mi: revision 1.712
Import tcpdrop(8) from OpenBSD
 1.130.2.1 12-Mar-2007  rmind Sync with HEAD.
 1.131.4.1 11-Jul-2007  mjf Sync with head.
 1.131.2.3 09-Oct-2007  ad Sync with head.
 1.131.2.2 20-Aug-2007  ad Sync with HEAD.
 1.131.2.1 15-Jul-2007  ad Sync with head.
 1.135.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.136.6.2 02-Aug-2007  rmind TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.136.6.1 02-Aug-2007  rmind file tcp_usrreq.c was added on branch matt-mips64 on 2007-08-02 02:42:42 +0000
 1.136.4.2 09-Jan-2008  matt sync with HEAD
 1.136.4.1 06-Nov-2007  matt sync with HEAD
 1.136.2.3 03-Dec-2007  joerg Sync with HEAD.
 1.136.2.2 04-Nov-2007  jmcneill Sync with HEAD.
 1.136.2.1 02-Oct-2007  joerg Sync with HEAD.
 1.137.4.1 13-Nov-2007  bouyer Sync with HEAD
 1.138.2.2 27-Dec-2007  mjf Sync with HEAD.
 1.138.2.1 08-Dec-2007  mjf Sync with HEAD.
 1.139.6.1 02-Jan-2008  bouyer Sync with HEAD
 1.139.2.1 26-Dec-2007  ad Sync with head.
 1.140.6.3 17-Jan-2009  mjf Sync with HEAD.
 1.140.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.140.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.142.2.1 18-May-2008  yamt sync with head.
 1.144.2.5 11-Mar-2010  yamt sync with head
 1.144.2.4 16-Sep-2009  yamt sync with head
 1.144.2.3 20-Jun-2009  yamt sync with head
 1.144.2.2 04-May-2009  yamt sync with head.
 1.144.2.1 16-May-2008  yamt sync with head.
 1.146.6.2 13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.146.6.1 19-Oct-2008  haad Sync with HEAD.
 1.146.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.149.6.1 17-Jun-2009  bouyer branches: 1.149.6.1.2;
Pull up following revision(s) (requested by rmind in ticket #812):
sys/netinet/tcp_usrreq.c: revision 1.155
sysctl_inpcblist: fix a lock leak in error path (hi <matt>).
 1.149.6.1.2.1 21-Apr-2010  matt sync to netbsd-5
 1.149.4.2 26-Sep-2009  snj Pull up following revision(s) (requested by darran in ticket #950):
sys/netinet/tcp_input.c: revision 1.299
sys/netinet/tcp_usrreq.c: revision 1.156
sys/netinet/tcp_var.h: revision 1.161
Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl.
Okayed by tls@.
 1.149.4.1 17-Jun-2009  bouyer Pull up following revision(s) (requested by rmind in ticket #812):
sys/netinet/tcp_usrreq.c: revision 1.155
sysctl_inpcblist: fix a lock leak in error path (hi <matt>).
 1.149.2.3 28-Apr-2009  skrll Sync with HEAD.
 1.149.2.2 03-Mar-2009  skrll Sync with HEAD.
 1.149.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.150.4.2 23-Jul-2009  jym Sync with HEAD.
 1.150.4.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.158.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.158.4.2 12-Jun-2011  rmind sync with head
 1.158.4.1 31-May-2011  rmind sync with head
 1.159.2.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.160.6.2 05-Apr-2012  mrg sync to latest -current.
 1.160.6.1 18-Feb-2012  mrg merge to -current.
 1.160.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.160.2.2 30-Oct-2012  yamt sync with head
 1.160.2.1 17-Apr-2012  yamt sync with head
 1.162.2.3 14-Dec-2013  bouyer Pull up following revision(s) (requested by kefren in ticket #992):
sys/netinet/tcp_usrreq.c: revision 1.170
Update TCP CB with new values on rfc1323 and mssdflt sysctl updates
=46rom yasuoka-cj7TXg5MjN14Eiagz67IpQ@public.gmane.org in kern/44254
 1.162.2.2 20-Oct-2013  bouyer Pull up following revision(s) (requested by spz in ticket #967):
sys/netinet/tcp_usrreq.c: revision 1.168
PR/48098: Brian Marcotte: Avoid kernel assertion for embryonic sockets that
don't have credentials yet.
XXX: pullup-6
 1.162.2.1 17-Mar-2012  bouyer branches: 1.162.2.1.4; 1.162.2.1.6;
Pull up following revision(s) (requested by jruoho in ticket #124):
sys/netinet/tcp_usrreq.c: revision 1.163
PR/46077: M. Nunberg: Stat should not fial on connecting socket.
 1.162.2.1.6.1 20-Oct-2013  bouyer Pull up following revision(s) (requested by spz in ticket #967):
sys/netinet/tcp_usrreq.c: revision 1.168
PR/48098: Brian Marcotte: Avoid kernel assertion for embryonic sockets that
don't have credentials yet.
XXX: pullup-6
 1.162.2.1.4.1 20-Oct-2013  bouyer Pull up following revision(s) (requested by spz in ticket #967):
sys/netinet/tcp_usrreq.c: revision 1.168
PR/48098: Brian Marcotte: Avoid kernel assertion for embryonic sockets that
don't have credentials yet.
XXX: pullup-6
 1.165.2.3 03-Dec-2017  jdolecek update from HEAD
 1.165.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.165.2.1 23-Jun-2013  tls resync from head
 1.166.4.5 18-May-2014  rmind sync with head
 1.166.4.4 17-Oct-2013  rmind Eliminate some of the splsoftnet() calls, misc clean up.
 1.166.4.3 23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.166.4.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.166.4.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.171.2.1 10-Aug-2014  tls Rebase.
 1.200.2.4 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.200.2.3 11-Sep-2017  snj Pull up following revision(s) (requested by jdolecek in ticket #1498):
sys/netinet/tcp_usrreq.c: revision 1.216
add some more getsockopt(2) params
 1.200.2.2 21-Feb-2015  martin branches: 1.200.2.2.2; 1.200.2.2.6;
Pull up following revision(s) (requested by he in ticket #530):
sys/netinet/tcp_output.c: revision 1.180
sys/netinet/tcp_input.c: revision 1.336
sys/netinet/tcp_usrreq.c: revision 1.203
share/man/man4/tcp.4: revision 1.30
sys/netinet/tcp.h: revision 1.31
sys/netinet/tcp_subr.c: revision 1.258
sys/netinet/tcp_var.h: revision 1.176
sys/netinet/tcp_var.h: revision 1.177
sys/sys/param.h: bump revision

Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).

Change the new counter variables in struct tcpcb to uint32_t, as
per christos' comments.
 1.200.2.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.200.2.2.6.1 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.200.2.2.2.1 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1676):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/tcp_usrreq.c 1.223 via patch
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.202.2.7 28-Aug-2017  skrll Sync with HEAD
 1.202.2.6 05-Feb-2017  skrll Sync with HEAD
 1.202.2.5 05-Dec-2016  skrll Sync with HEAD
 1.202.2.4 29-May-2016  skrll Sync with HEAD
 1.202.2.3 22-Sep-2015  skrll Sync with HEAD
 1.202.2.2 06-Jun-2015  skrll Sync with HEAD
 1.202.2.1 06-Apr-2015  skrll Sync with HEAD
 1.212.2.2 20-Mar-2017  pgoyette Sync with HEAD
 1.212.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.213.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.214.6.2 29-Jan-2019  msaitoh Pull up following revision(s) (requested by martin in ticket #1175):
sys/net/link_proto.c 1.37
sys/netatalk/ddp_usrreq.c 1.72
sys/netbt/hci_socket.c 1.46
sys/netbt/l2cap_socket.c 1.36
sys/netbt/rfcomm_socket.c 1.38
sys/netbt/sco_socket.c 1.38
sys/netinet/sctp_usrreq.c 1.14
sys/netinet/tcp_usrreq.c 1.223
sys/netinet6/raw_ip6.c 1.173
sys/netinet6/sctp6_usrreq.c 1.17
sys/netinet6/udp6_usrreq.c 1.146
sys/netmpls/mpls_proto.c 1.32
sys/netnatm/natm.c patch

Fix memory leaks pointed out by Ilja Van Sprundel: all
sendoob() functions are expted to free both passed
mbuf chains.
 1.214.6.1 25-Aug-2017  snj Pull up following revision(s) (requested by jdolecek in ticket #216):
sys/netinet/tcp_usrreq.c: revision 1.216
add some more getsockopt(2) params
 1.216.2.5 26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.216.2.4 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.216.2.3 21-May-2018  pgoyette Sync with HEAD
 1.216.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.216.2.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.219.2.3 21-Apr-2020  martin Sync with HEAD
 1.219.2.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.219.2.1 10-Jun-2019  christos Sync with HEAD
 1.224.4.1 10-Sep-2019  martin Pull up following revision(s) (requested by maxv in ticket #193):

sys/netinet/tcp_timer.h: revision 1.30
sys/netinet/tcp_input.c: revision 1.415
sys/netinet/tcp_usrreq.c: revision 1.225
sys/netinet/tcp_subr.c: revision 1.283

Clamp tcp timer quantities to reasonable ranges.
 1.225.6.1 20-Apr-2020  bouyer Sync with HEAD
 1.227.2.2 03-Apr-2021  thorpej Sync with HEAD.
 1.227.2.1 14-Dec-2020  thorpej Sync w/ HEAD.
 1.199 03-Dec-2024  andvar s/packlets/packets/ in comment.
 1.198 28-Oct-2022  ozaki-r branches: 1.198.8;
inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.197 20-Sep-2022  ozaki-r tcp: separate syn cache stuffs into tcp_syncache.[ch] files

No functional change.
 1.196 31-Jul-2021  andvar s/threshhold/threshold
 1.195 08-Mar-2021  christos branches: 1.195.4;
Remove the unused "addin" argument (it was always 0) and go back using
a random iss by default (instead of rfc1948)
 1.194 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.193 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.192 05-Mar-2020  riastradh branches: 1.192.4;
Revert "Include opt_diagnostic.h for DIAGNOSTIC."

This did not do what I thought it did. opt_diagnostic.h is only for
the unused _DIAGNOSTIC, which seems like an abortive attempt to
incrementally convert DIAGNOSTIC to an opt_*.h option rather than a
command-line option.
 1.191 05-Mar-2020  riastradh Include opt_diagnostic.h for DIAGNOSTIC.

...at least, in header files, which may not have already included
libkern.h.
 1.190 27-Dec-2018  maxv Remove unused arguments.
 1.189 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.188 03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.187 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.186 29-Apr-2018  maxv branches: 1.186.2;
Move struct tcpiphdr from tcpip.h to tcp_var.h, to match UDP (udpiphdr in
udp_var.h).

tcpip.h is now empty, and can be removed.
 1.185 28-Mar-2018  maxv Remove two unused args from syn_cache_get().
 1.184 12-Feb-2018  maxv branches: 1.184.2;
Remove unused argument from tcp_signature_getsav.
 1.183 12-Feb-2018  maxv Remove the 'm' argument from syn_cache_respond(); all it does with it is
freeing it, so free in the caller instead.
 1.182 19-Jan-2018  ozaki-r Run tcp_slowtimo in workqueue if NET_MPSAFE

If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.

NFCI for !NET_MPSAFE
 1.181 15-Nov-2017  ozaki-r Make syn_cache_timer static
 1.180 31-Jul-2017  maxv Fix TCPCTL_NAMES, and remove TCPCTL_VARIABLES.
 1.179 28-Jul-2017  maxv Remove TCP_COMPAT_42. This feature is a workaround for a bug in the TCP
stack of BSD4.2. Having such features just does not make any sense, and
looking at the code, I'm not sure it actually works.
 1.178 07-Jul-2017  ozaki-r Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
 1.177 14-Feb-2015  he branches: 1.177.10;
Change the new counter variables in struct tcpcb to uint32_t, as
per christos' comments.
 1.176 14-Feb-2015  he Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).
 1.175 31-Jul-2014  rtr branches: 1.175.2; 1.175.4;
split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.174 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.173 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.172 02-Jan-2014  pooka branches: 1.172.2;
Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
 1.171 12-Nov-2013  kefren * implement TCP CUBIC congestion control algorithm
* move tcp_sack_newack bits inside reno and newreno_fast_retransmit_newack
* notify ECN peer about cwnd shrink in [new]reno_slow_retransmit

Based on the patch proposed on tech-net@ on Nov 7 with minor improvments:
* adapt wmax for no-fast convergence case
* correct cbrt calculation for big window sizes (>750KB)
 1.170 10-Apr-2013  christos branches: 1.170.4;
Limit the tcp initial window setting to 10, leaving it by default to 4
and simplifying the code in process. Per draft-ietf-initcwnd-08.txt.
 1.169 02-Feb-2012  tls branches: 1.169.6;
Entropy-pool implementation move and cleanup.

1) Move core entropy-pool code and source/sink/sample management code
to sys/kern from sys/dev.

2) Remove use of NRND as test for presence of entropy-pool code throughout
source tree.

3) Remove use of RND_ENABLED in device drivers as microoptimization to
avoid expensive operations on disabled entropy sources; make the
rnd_add calls do this directly so all callers benefit.

4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might
have lead to slight entropy overestimation for some sources.

5) Add new source types for environmental sensors, power sensors, VM
system events, and skew between clocks, with a sample implementation
for each.

ok releng to go in before the branch due to the difficulty of later
pullup (widespread #ifdef removal and moved files). Tested with release
builds on amd64 and evbarm and live testing on amd64.
 1.168 31-Oct-2011  yamt branches: 1.168.2; 1.168.6;
tcp_reass_unlock: assertion
 1.167 25-May-2011  gdt Add comment urging a separation of TCP_RTT_SHIFT into separate defines
describing the EWMA calculation and the storage representation.
(No code change.)
 1.166 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.165 03-May-2011  dyoung *_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
 1.164 20-Apr-2011  gdt Rewrite comments about TCP RTO calculations.

Long ago, the storage representations of srtt and rttvar were changed
from the 4.4BSD scheme, and the comments are out of sync with the
code. This commit rewrites most of the comments that explain the RTO
calculations, and points out some issues in the code.

Joint work with Bev Schwartz of BBN (original analysis and comments),
but I have rewritten and extended them, so errors are mine.

This material is based upon work supported by the Defense Advanced
Research Projects Agency and Space and Naval Warfare Systems Center,
Pacific, under Contract No. N66001-09-C-2073. Approved for Public
Release, Distribution Unlimited
 1.163 14-Apr-2011  yamt comments
 1.162 16-Sep-2009  pooka branches: 1.162.4; 1.162.6;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.161 09-Sep-2009  darran Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl.
Okayed by tls@.
 1.160 27-May-2009  pooka POOL_INIT -> pool_init
 1.159 29-Jan-2009  pooka branches: 1.159.2;
stinkset purge: POOL_INIT -> pool_init
also, make the syncache pool static in scope
 1.158 06-Aug-2008  plunky branches: 1.158.2; 1.158.4; 1.158.10;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.157 28-Apr-2008  martin branches: 1.157.2; 1.157.6;
Remove clause 3 and 4 from TNF licenses
 1.156 24-Apr-2008  ad branches: 1.156.2;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.155 12-Apr-2008  thorpej branches: 1.155.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.154 08-Apr-2008  thorpej Change TCP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old tcpstat structure; old netstat
binaries will continue to work properly.
 1.153 29-Feb-2008  matt Rework tcp congctl selection code so that the congctl entries can be const.
Don't access tcp_congctl stuff outside of tcp_congctl.c, use routines to
update t_congctl. This code is slightly now more complicated.
 1.152 27-Feb-2008  matt Convert stragglers to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.151 25-Dec-2007  perry branches: 1.151.2; 1.151.6;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.150 02-Aug-2007  rmind branches: 1.150.4; 1.150.10; 1.150.12; 1.150.16; 1.150.20;
TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.149 09-Jul-2007  ad branches: 1.149.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.148 25-Jun-2007  christos tcpdrop kernel bits (from anon ymous)
 1.147 20-Jun-2007  christos - per socket keepalive settings
- settable connection establishment timeout
 1.146 02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.145 04-Mar-2007  christos branches: 1.145.2; 1.145.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.144 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.143 06-Dec-2006  yamt branches: 1.143.2;
add some more tcp mowners.
 1.142 06-Dec-2006  yamt - make tcp_reass static.
- constify.
 1.141 21-Oct-2006  yamt branches: 1.141.2; 1.141.4;
- constify.
- make tcp_dooptions and tcpipqent_pool static.
 1.140 19-Oct-2006  yamt implement RFC3465 appropriate byte counting.
from Kentaro A. Kurahone, with minor adjustments by me.
the ack prediction part of the original patch was omitted because
it's a separate change. reviewed by Rui Paulo.
 1.139 16-Oct-2006  rpaulo Export the tcp_do_rfc1948 variable to userland via sysctl.
The code to generate an ISS via an MD5 hash has been present in the
NetBSD kernel since 2001, but it wasn't even exported to userland at
that time. It was agreed on tech-net with the original author <thorpej>
that we should let the user decide if he wants to enable it or not.
Not enabled by default.
 1.138 09-Oct-2006  rpaulo Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
 1.137 05-Sep-2006  rpaulo branches: 1.137.2; 1.137.4;
Import of TCP ECN algorithm for congestion control.
Both available for IPv4 and IPv6.
Basic implementation test results are available at
http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.

Work sponsored by the Google Summer of Code project 2006.
Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their
help, comments and support during the project.
 1.136 22-Jul-2006  rpaulo revert stuff that shouldn't have gone in.
 1.135 22-Jul-2006  rpaulo TCP RFC is 793, not 783.
 1.134 16-Feb-2006  perry branches: 1.134.2;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.133 24-Dec-2005  perry branches: 1.133.2; 1.133.4; 1.133.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.132 11-Dec-2005  christos merge ktrace-lwp.
 1.131 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.130 06-Sep-2005  rpaulo Implement tcp.inet{,6}.tcp{,6}.(debug|debx) when TCP_DEBUG is set. They
can be used to ``transliterate protocol trace'' like trpt(8) does.
 1.129 10-Aug-2005  yamt move {tcp,udp}_do_loopback_cksum back to tcp/udp
so that they can be referenced by ipv6.
 1.128 05-Aug-2005  elad Add sysctls for IP, ICMP, TCP, and UDP statistics.
 1.127 19-Jul-2005  christos Implement PMTU checks from:

http://www.gont.com.ar/drafts/icmp-attacks-against-tcp.html

1. Don't act on ICMP-need-frag immediately if adhoc checks on the
advertised MTU fail. The MTU update is delayed until a TCP retransmit
happens.
2. Ignore ICMP Source Quench messages meant for TCP connections.

From OpenBSD.
 1.126 29-May-2005  christos branches: 1.126.2;
- add const
- remove bogus casts
- avoid nested variables
 1.125 05-Apr-2005  kurahone Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.

Idea taken from FreeBSD.
 1.124 29-Mar-2005  yamt protect tcpipqent with splvm.
 1.123 16-Mar-2005  yamt branches: 1.123.2;
simplify data receiver side sack processing.
- introduce t_segqlen, the number of segments in segq/timeq.
the name is from freebsd.
- rather than maintaining a copy of sack blocks (rcv_sack_block[]),
build it directly from the segment list when needed.
 1.122 16-Mar-2005  yamt - use full sized segments unless we actually have SACKs to send.
- avoid TSO duplicate D-SACK.
- send SACKs regardless of TF_ACKNOW.
- don't clear rcv_sack_num when transmitting.

discussed on tech-net@.
 1.121 09-Mar-2005  atatat gc the tcp_sysctl() prototype since it's completely vestigial
 1.120 02-Mar-2005  mycroft Copyright maintenance.
 1.119 28-Feb-2005  jonathan Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over. Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting both
Kentaro's wired-Ethernet NIC and in my two (different) WiFi NICs.
 1.118 06-Feb-2005  pk Update tcp_trace() prototype to match implementation.
 1.117 27-Jan-2005  mycroft Introduce a new state variable, t_partialacks. It has 3 states:
* t_partialacks<0 means we are not in fast recovery.
* t_partialacks==0 means we are in fast recovery, but we have not received
any partial acks yet.
* t_partialacks>0 means we are in fast recovery, and we have received
partial acks.

This is used to implement 2 changes in RFC 3782:
* We keep the notion that we are in fast recovery separate from t_dupacks, so
it is not reset due to out-of-order acks. (This affects both the Reno and
NewReno cases.)
* We only reset the retransmit timer on the first partial ack -- preventing us
from possibly taking one RTO per segment once fast recovery is initiated.

As before, it is hard to measure any difference between Reno and NewReno in the
real-world cases that I've tested.
 1.116 26-Jan-2005  mycroft Fix two problems in our TCP stack:

1) If an echoed RFC 1323 time stamp appears to be later than the current time,
ignore it and fall back to old-style RTT calculation. This prevents ending
up with a negative RTT and panicking later.

2) Fix NewReno. This involves a few changes:

a) Implement the send_high variable in RFC 2582. Our implementation is
subtly different; it is one *past* the last sequence number transmitted
rather than being equal to it. This simplifies some logic and makes
the code smaller. Additional logic was required to prevent sequence
number wraparound problems; this is not mentioned in RFC 2582.

b) Make sure we reset t_dupacks on new acks, but *not* on a partial ack.
All of the new ack code is pushed out into tcp_newreno(). (Later this
will probably be a pluggable function.) Thus t_dupacks keeps track of
whether we're in fast recovery all the time, with Reno or NewReno, which
keeps some logic simpler.

c) We do not need to update snd_recover when we're not in fast recovery.
See tech-net for an explanation of this.

d) In the gratuitous fast retransmit prevention case, do not send a packet.
RFC 2582 specifically says that we should "do nothing".

e) Do not inflate the congestion window on a partial ack. (This is done by
testing t_dupacks to see whether we're still in fast recovery.)

This brings the performance of NewReno back up to the same as Reno in a few
random test cases (e.g. transferring peer-to-peer over my wireless network).
I have not concocted a good test case for the behavior specific to NewReno.
 1.115 21-Dec-2004  yamt branches: 1.115.2; 1.115.4;
factor out receive side tcp/udp checksum handling code so that they
can be used by eg. packet filters.

reviewed by Christos Zoulas on tech-net@.
(slightly tweaked since then to make tcp and udp similar.)
 1.114 15-Dec-2004  thorpej Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.113 15-Sep-2004  yamt fix ipqent pool corruption problems. make tcp reass code use
its own pool of ipqent rather than sharing it with ip reass code.
PR/24782.
 1.112 18-May-2004  itojun fix MD5 signature support to actually validate inbound signature, and
drop packet if fails.
 1.111 26-Apr-2004  itojun make TCP MD5 signature work with KAME IPSEC (#define IPSEC).

support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream
for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the
right thing).

XXX
current TCP MD5 signature code has giant flaw:
it does not validate signature on input (can't believe it! what is the point?)
 1.110 25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.109 21-Apr-2004  itojun no space between function name and paren: foo (blah) -> foo(blah)
 1.108 20-Apr-2004  itojun - respond to RST by ACK, as suggested in NISCC recommendation
- rate-limit ACKs against RSTs and SYNs
 1.107 18-Apr-2004  matt De __P()
 1.106 22-Oct-2003  thorpej branches: 1.106.2;
Rather than zeroing a tcpcb structure and filling in all the fields
individually, create a tcpcb template pre-initialized (and pre-zero'd)
with the static and mostly-static tcpcb parameters. The template is
now copied into the new tcpcb, which zeros and initializes most of the
tcpcb in one pass. The template is kept up-to-date as TCP sysctl
variables are changed.

Combined with the previous sb_max change, TCP socket creation is now
25% faster.
 1.105 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.104 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.103 20-Jul-2003  he As a temporary workaround, apply the fix from PR#20390, thereby
cooperating with the callout code in working around the race
condition caused by the TCP code's use of the callout facility.

Instead of unconditionally releasing memory in tcp_close() and
SYN_CACHE_PUT(), check whether any of the related callout handlers
are about to be invoked (but have not yet done callout_ack()), and
if so, just mark the associated data structure (tcpcb or syn cache
entry) as "dead", and test for this (and release storage) in the
callout handler functions.
 1.102 29-Jun-2003  fvdl branches: 1.102.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.101 29-Jun-2003  ragge Add code to remember where in the send queue of mbufs the last packet was
sent from. This change avoid a linear search through all mbufs when using
large TCP windows, and therefore permit high-speed connections on long
distances.

Tested on a 1 Gigabit connection between Lule� and San Francisco, a distance
of about 15000km. With TCP windows of just over 20 Mbytes it could keep up
with 950Mbit/s.

After discussions with Matt Thomas and Jason Thorpe.
 1.100 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.99 26-Jun-2003  christos abuse the mib instead of abusing the new pointer. Idea from simon burge.
It allows the tcp_sysctl_ident to run by non-super-users. No backwards
compatibility provided.
 1.98 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.97 19-Apr-2003  christos PR/2352: Tor Egge: Add sysctl to get uid of connected socket.
 1.96 01-Mar-2003  thorpej Allow TCP connections to hosts on a local network to use a larger
slow start initial window. Default this larger initial window to
4 packets, allowing it to be adjusted with net.inet.tcp.init_win_local.
 1.95 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.94 02-Nov-2002  perry /*CONTCOND*/ while (0)'ed macros
 1.93 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.92 09-Jun-2002  itojun whitespace
 1.91 26-May-2002  itojun path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
 1.90 12-May-2002  matt branches: 1.90.2; 1.90.4;
Eliminate commons.
 1.89 15-Mar-2002  itojun have tcp6_drain
 1.88 24-Jan-2002  itojun place NRL copyright notice itself, not a reference to it.
 1.87 11-Sep-2001  thorpej Use callouts for SYN cache timers, rather than traversing time queues
in tcp_slowtimo().
 1.86 10-Sep-2001  thorpej Use callouts for TCP timers, rather than traversing the list of
all open TCP connections in tcp_slowtimo() (which is called 2x
per second). It's fairly rare for TCP timers to actually fire,
so saving this list traversal is good, especially if you want
to scale to thousands of open connections.
 1.85 10-Sep-2001  thorpej Split tcp_timers() into multiple functions, one for each timer,
and call it directly from tcp_slowtimo() (via a table) rather
than going through tcp_userreq().

This will allow us to call TCP timers directly from callouts,
in a future revision.
 1.84 10-Sep-2001  thorpej Change the way receive idle time and round trip time are measured.
Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now
take a timstamp (via tcp_now) and use subtraction to compute the
delta when we actually need it (using unsigned arithmetic so that
tcp_now wrapping is handled correctly).

Based on similar changes in FreeBSD.
 1.83 10-Sep-2001  thorpej Use a callout for the delayed ACK timer, and delete tcp_fasttimo().
Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
 1.82 31-Jul-2001  thorpej branches: 1.82.2;
Count the number of times we "self-quench" (ip_output() returns
ENOBUFS), and don't inline tcp_segsize() if profiling.
 1.81 30-May-2001  mrg branches: 1.81.2;
use _KERNEL_OPT
 1.80 26-May-2001  matt Make t_flags a u_int instead of u_short. It's followed by a mbuf pointer
so there's padding around it already. And it increases the amount of bits
available for TF_* flags.
 1.79 13-Apr-2001  thorpej Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.78 20-Mar-2001  thorpej Two changes, designed to make us even more resilient against TCP
ISS attacks (which we already fend off quite well).

1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic
hash method of generating TCP ISS values. Note, this code is experimental
and disabled by default (experimental enough that I don't export the
variable via sysctl yet, either). There are a couple of issues I'd
like to discuss with Steve, so this code should only be used by people
who really know what they're doing.

2. Per a recent thread on Bugtraq, it's possible to determine a system's
uptime by snooping the RFC1323 TCP timestamp options sent by a host; in
4.4BSD, timestamps are created by incrementing the tcp_now variable
at 2 Hz; there's even a company out there that uses this to determine
web server uptime. According to Newsham's paper "The Problem With
Random Increments", while NetBSD's TCP ISS generation method is much
better than the "random increment" method used by FreeBSD and OpenBSD,
it is still theoretically possible to mount an attack against NetBSD's
method if the attacker knows how many times the tcp_iss_seq variable
has been incremented. By not leaking uptime information, we can make
that much harder to determine. So, we avoid the leak by giving each
TCP connection a timebase of 0.
 1.77 19-Oct-2000  itojun branches: 1.77.2;
remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c
(separate TCP/IPv6 stack) into netbsd-current.
 1.76 18-Oct-2000  thorpej Restructure the Path MTU Discovery code somewhat to avoid
entering rtentry's for hosts we're not actually communicating
with.

Do this by invoking the ctlinput for the protocol, which is
responsible for validating the ICMP message:
* TCP -- Lookup the connection based on the address/port
pairs in the ICMP message.
* AH/ESP -- Lookup the SA based on the SPI in the ICMP message.

If validation succeeds, ctlinput is responsible for calling
icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered
by protocols (such as TCP) which want to take some sort of special
action when a path's MTU changes. For TCP, this is where we now
refresh cached routes and re-enter slow-start.

As a side-effect, this fixes the problem where TCP would not be
notified when a path's MTU changed if AH/ESP were being used.

XXX Note, this is only a fix for the IPv4 case. For the IPv6
XXX case, we need to wait for the KAME folks.

Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
 1.75 15-Aug-2000  itojun net.inet.tcp.rstratelimit is deprecated. make it invalid and return
ENOPROTOOPT.
 1.74 28-Jul-2000  itojun nuke the following sysctl variables. "ppsratelimit" should work better.
need to recompile sbin/sysctl after updating /usr/include.
net.inet.tcp.rstratelimit
net.inet.icmp.errratelimit
net.inet6.icmp6.errratelimit
 1.73 27-Jul-2000  itojun implement net.inet.tcp.rstppslimit to limit TCP RSTs by packet-per-second
basis. default: 100pps

set default value for net.inet.tcp.rstratelimit to 0 (disabled),
NOTE: it does not work right for smaller-than-1/hz interval. maybe we should
nuke it, or make it impossible to set smaller-than-1/hz value.
 1.72 15-Feb-2000  thorpej branches: 1.72.4;
Add support for rate-limiting RSTs sent in response to no socket for
an incoming packet. Default minimum interval is 10ms. The interval
is changeable via the "net.inet.tcp.rstratelimit" sysctl variable.
 1.71 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.70 08-Dec-1999  itojun do not drop from IP header to tcp option until sbappend(), to reduce
requirement to mbuf chain.
part of KAME sync, committed separately for its (possible) impact.
 1.69 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.68 23-Sep-1999  itojun branches: 1.68.2; 1.68.8;
cleanup and correct TCP MSS consideration with IPsec headers.

MSS advertisement must always be:
max(if mtu) - ip hdr siz - tcp hdr siz
We violated this in the previous code so it was fixed.

tcp_mss_to_advertise() now takes af (af on wire) as its argument,
to compute right ip hdr siz.

tcp_segsize() will take care of IPsec header size.
One thing I'm not really sure is how to handle IPsec header size in
*rxsegsizep (inbound segment size estimation).
The current code subtracts possible *outbound* IPsec size from *rxsegsizep,
hoping that the peer is using the same IPsec policy as me.
It may not be applicable, could TCP gulu please comment...
 1.67 25-Aug-1999  itojun When listening socket goes away, remove assockated syn cache entires.
Stale syn cache entries are useless because none of them will be used
if there is no listening socket, as tcp_input looks up listening socket by
in_pcblookup*() before looking into syn cache.

This fixes race condition due to dangling socket pointer from syn cache
entries to listening socket (this was introduced when ipsec is merged in).

This should preserve currently implemented behavior (but not 4.4BSD
behavior prior to syn cache).

Tested in KAME repository before commit, but we'd better run some
regression tests.
 1.66 12-Aug-1999  itojun fix sototcpcb(). this sometimes caused panic on OOB data reception.

the macro may need to be expanded into dedicated function, rather than a macro,
to capture unsupported values.
 1.65 31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.64 22-Jul-1999  itojun - implement IPv6 pmtud, which is necessary for TCP6.
- fix memory leak on SO_DEBUG over TCP.
 1.63 14-Jul-1999  itojun Use proper ip protocol # field and tcp hdr on sending RST against SYN,
when ip header and tcp header are not adjacent to each other
(i.e. when ip6 options are attached).

To test this, try
telnet @::1@::1 port
toward a port without responding server. Prior to the fix, the kernel will
generate broken RST packet.
 1.62 09-Jul-1999  thorpej defopt INET6, and put it in opt_inet.h (most places already include this
file, which is why the file list is so short).
 1.61 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.60 23-May-1999  ad Add new sysctl (net.inet.tcp.log_refused) that when set, causes refused TCP
connections to be logged.
 1.59 29-Apr-1999  thorpej Implement retransmit logic for the SYN cache engine. Fixes a rare condition
where one side can think a connection exists, where the other side thinks
the connection was never established.

The original problem was first reported by Ty Sarna in PR #5909. The
original fix I made to the code didn't cover all cases. The problem this
fix addresses was reported by Christoph Badura via private e-mail.

Many thanks to Bill Sommerfeld for helping me to test this code, and
for finding a subtle bug.
 1.58 24-Jan-1999  thorpej branches: 1.58.2;
Oops, forgot to update copyright notice in previous.
 1.57 24-Jan-1999  thorpej * Completely rewrite syn_cache_respond().
- Don't use tcp_respond(), instead create the tcp/ip header from scratch,
and send it ourself.
- Reuse the mbuf that carried the SYN, or allocate one if that is not
available.
- Cache the route we look up to do the Path MTU Discovery check, and
transfer the reference to that route to the inpcb when the connection
completes.
* Macro'ize a small, but often repeated code fragment.
 1.56 18-Dec-1998  thorpej Add a lock around the TCPCB's sequence queue, to prevent tcp_drain()
from corrupting the queue if called from a device's interrupt context.

Similar in nature to the problem reported in PR #5684.
 1.55 06-Oct-1998  matt Add a sysctl for newreno (default to off).
 1.54 04-Oct-1998  matt Adapt the NEWRENO changes from the UCSB diffs of BSDI 3.0's TCP
to NetBSD. Ignore the SACK & FACK stuff for now.
 1.53 10-Sep-1998  mouse Create tcp.keepidle, tcp.keepintvl, tcp.keepcnt, tcp.slowhz sysctls.
 1.52 09-Sep-1998  thorpej Use an algorithm similar to that in tcp_notify() to determine if
syn_cache_unreach() should remove the entry, or just continue on.

Algorithm is to only remove the entry if we've had more than one unreach
error and have retransmitted 3 or more times. This prevents the following
scenario, as noted in PR #5909 (PR from Ty Sarna, scenario from
Charles Hannum):

* Host A sends a SYN.
* Host A retransmits the SYN.
* Host B gets the first SYN and sends a SYN-ACK.
* Host B gets the second SYN and sends a SYN-ACK.
* One of the SYN-ACK bounces with an
ICMP unreachable, causing the `SYN cache' entry to be
removed with no notification.
* Host A receives the other SYN-ACK, sends an ACK, and goes to
ESTABLISHED state.

Should fix PR #5909.
 1.51 21-Jul-1998  mycroft Implement a better fix for the `gratuitous FIN' problem, as
mentioned on tcp-impl but with a bit more commentary.
 1.50 11-May-1998  thorpej Nuke TUBA per my note to tech-net; there's no reason to keep it around.
 1.49 07-May-1998  thorpej Rework the syn cache code somewhat:
- Don't use home-grown queue manipulation. Use <sys/queue.h> instead. The
data structures are a little larger, but we are otherwise wasting the
memory chunk anyway (we're already a 64-byte malloc bucket).
- Fix a bug in the cache-is-full case: if the oldest element removed from
the first non-empty bucket was the only element in the bucket, the
bucket wouldn't be removed from the bucket cache, causing queue corruption
later.
- Optimize the syn cache timers by using PRT timers rather than home-grown
decrement-and-propagate timers.

This code is now a fair bit smaller, and significantly easier to read
and understand.
 1.48 06-May-1998  thorpej Use the monotonically increasing slow timer timestamp provided by
the protocol dispatch layer for TCP timers. This saves having to
modify a potentially large number of timer values (which were shorts,
and expanded to ... a lot of code on the Alpha).
 1.47 02-May-1998  thorpej Reintroduce the immediate ACK-on-PUSH behavior removed in revision 1.47,
but make the decision to do this dependent on the sysctl variable
net.inet.tcp.ack_on_push, which is disabled by default.
 1.46 01-May-1998  thorpej Garbage-collect.
 1.45 30-Apr-1998  thorpej In the CWM code, don't use the Floyd initial window computation as
the burst size allowed, but rather a fixed number of packets, as
described in the Internet Draft. Default allowed burst is 4 packets,
per the Draft.

Make the use of CWM and the allowed burst size tunable via sysctl.
 1.44 30-Apr-1998  thorpej Make tcp_compat_42 a sysctl option.
 1.43 29-Apr-1998  matt New TCP reassembly code. The new code reduces the memory needed by
out-of-order packets and builds the infrastructure needed for sending
SACK blocks (to be added shortly).
 1.42 29-Apr-1998  thorpej Make use of the work-arounds for ancient broken TCP peers run-time
conditional (tcp_compat_42). The kernel config option TCP_COMPAT_42
will still enable this by default, or disable this by default if the
option is not included (i.e. current behavior). This will be made a
sysctl soon.
 1.41 13-Apr-1998  kml Fix to ensure that the correct MSS is advertised for loopback
TCP connections by using the MTU of the interface. Also added
a knob, mss_ifmtu, to force all connections to use the MTU of
the interface to calculate the advertised MSS.
 1.40 07-Apr-1998  thorpej Remember any source routes that may have accompanied a SYN.
 1.39 03-Apr-1998  thorpej Now that we have a flags word in the syn cache entry, use a flag to indicate
"peer will do timestamps" rather than a bitfield, and give the now-unsed
bit to the hash, making it now 32 bits.
 1.38 03-Apr-1998  thorpej Clean up some comments wrt. the syn cache code.
 1.37 31-Mar-1998  thorpej Fix a potential-congestion case in the larger initial congestion window
code, as clarified in the TCPIMPL WG meeting at IETF #41: If the SYN
(active open) or SYN,ACK (passive open) was retransmitted, the initial
congestion window for the first slow start of that connection must be
one segment.
 1.36 17-Mar-1998  kml Ensure that the TCP segment size reflects the size of TCP options
in the packet. This fixes a bug that was resulting in extra packets
in retransmissions (the second packet would be 12 bytes long,
reflecting the RFC1323 timestamp option size).
 1.35 19-Feb-1998  thorpej Update copyright (sigh, should have done this long ago).
 1.34 10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.33 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.32 31-Dec-1997  thorpej Implement a queue for delayed ACK processing. This queue is used in
tcp_fasttimo() in lieu of scanning all open TCP connections.
 1.31 17-Dec-1997  thorpej Keep stats on connections dropped due to excessive persist timeout.
 1.30 13-Dec-1997  thorpej After further examination of traces of bulk transfers (with help from
Kevin Lahey), undo the "defer window update until next delayed ACK".
 1.29 11-Dec-1997  thorpej Implement an infrastructure to allow larger initial congestion windows.
The sysctl'able variable "tcp_init_win", when set to 0, selects an
auto-tuning algorithm for selecting the initial window, based on transmit
segment size, per discussion in the IETF tcpimpl working group.

Default initial window is still 1 segment, but will soon become 2 segments,
per discussion in tcpimpl.
 1.28 11-Dec-1997  thorpej In the PRU_RCVD entry point, if TF_DELACK is set, don't send the window
update now, since it will be sent within 200ms when the delayed ACK is
sent. Instrument how many hits we get on this optimization.
 1.27 10-Dec-1997  thorpej Implement tcp_drain().
 1.26 08-Nov-1997  kml TCP MSS fixes to provide cleaner slow-start and recovery.
 1.25 17-Oct-1997  kml branches: 1.25.2;
Path MTU Discovery support. This is turned off by default.
Use sysctl -w net.inet.icmp.mtudisc=1 to turn on.
Still to come: path removal after some period, black hole detection
 1.24 10-Oct-1997  explorer Add hooks to use the kernel random system to generate TCP sequence numbers.
 1.23 22-Sep-1997  thorpej Fix several annoyances related to MSS handling in BSD TCP:
- Don't overload t_maxseg. Previous behavior was to set it to the min
of the peer's advertised MSS, our advertised MSS, and tcp_mssdflt
(for non-local networks). This breaks PMTU discovery running on
either host. Instead, remember the MSS we advertise, and use it
as appropriate (in silly window avoidance).
- Per last bullet, split tcp_mss() into several functions for handling
MSS (ours and peer's), and performing various tasks when a connection
becomes ESTABLISHED.
- Introduce a new function, tcp_segsize(), which computes the max size
for every segment transmitted in tcp_output(). This will eventually
be used to hook in PMTU discovery.
 1.22 29-Aug-1997  gwr Tweaks to allow operation with an interface address of 0.0.0.0
(needed for NFS mountroot using BOOTP to get boot parameters)
 1.21 28-Jul-1997  thorpej branches: 1.21.2;
Make the following tunable via sysctl, inspired by BSD/OS:
- tcp_sendspace
- tcp_recvspace
- tcp_mssdflt
- tcp_syn_cache_limit
- tcp_syn_bucket_limit
- tcp_syn_cache_timer
 1.20 23-Jul-1997  thorpej Pull SYN_cache_branch down into the main line.
 1.19 10-Dec-1996  mycroft branches: 1.19.8;
Fix RTT scaling problems introduced with Brakmo and Peterson changes.
 1.18 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.17 13-Feb-1996  christos branches: 1.17.4;
netinet prototypes
 1.16 31-Jan-1996  mycroft Build a hash table of PCBs. Hash function needs tweaking.
 1.15 21-Nov-1995  cgd make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.
 1.14 30-Sep-1995  thorpej branches: 1.14.2;
Implement tcp_sysctl(). Add a sysctl option to enable/disable RFC1323
extensions to TCP. From John Kohl <jtk@kolvir.blrc.ma.us>.
 1.13 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.12 11-Jun-1995  mycroft As suggested by Brakmo and Peterson:
* Don't add the extra 1/8 of the mss when ramping up the congestion window.
* Scale the RTT values slightly to adjust for rounding errors.
* Set the lower bound of the RTO to RTT+2.
 1.11 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.10 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.9 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.8 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.7 10-Jan-1994  mycroft Change the counters to be all the same type -- u_long.
 1.6 10-Jan-1994  mycroft Don't prototype this until it's safe.
 1.5 08-Jan-1994  mycroft Prototypes.
 1.4 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.14.2.1 02-Feb-1996  mycroft Bring in changes for mondo patch 2.
 1.17.4.2 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.17.4.1 10-Dec-1996  mycroft From trunk:
Fix RTT scaling problems introduced with Brakmo and Peterson changes.
 1.19.8.6 16-Jul-1997  thorpej Declare struct tcp_opt_info here; it's needed by tuba_tcpinput().
 1.19.8.5 29-Jun-1997  thorpej Instrument syn cache hash collisions.
 1.19.8.4 28-Jun-1997  thorpej KNF.
 1.19.8.3 28-Jun-1997  thorpej Use explicit type sizes in struct cyn_cache, and add a comment about
this structure being larger than intended on the Alpha.
 1.19.8.2 26-Jun-1997  thorpej tcp_mss() needs to take a u_int, not a u_int16_t.
 1.19.8.1 14-May-1997  mellon More of David Borman's SYN cache patches for Lite2:

- Define syn_cache entry and syn_cache_head structures.
- Add syn_cache statistics to tcpstat structure.
- Declare externs for syn cache variables.
- Update prototypes: tcp_dooptions, tcp_mss, tcp_respond.
- Add prototypes for syn_cache_* functions.
 1.21.2.3 14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.21.2.2 29-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.21.2.1 01-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.25.2.4 09-May-1998  mycroft Pull up patch from kml.
 1.25.2.3 05-May-1998  mycroft Pull up 1.36, per request of kml.
 1.25.2.2 29-Jan-1998  mellon Pull up 1.27-1.33 (thorpej)
 1.25.2.1 08-Nov-1997  thorpej Pull up from trunk: TCP MSS fixes to provide cleaner slow-start and recovery.
(kml)
 1.58.2.1 29-Apr-1999  perry branches: 1.58.2.1.2; 1.58.2.1.4;
pullup 1.58->1.59 (thorpej)
 1.58.2.1.4.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.58.2.1.4.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.58.2.1.4.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.58.2.1.2.3 02-Aug-1999  thorpej Update from trunk.
 1.58.2.1.2.2 01-Jul-1999  thorpej Sync w/ -current.
 1.58.2.1.2.1 21-Jun-1999  thorpej Sync w/ -current.
 1.68.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.68.2.3 21-Apr-2001  bouyer Sync with HEAD
 1.68.2.2 27-Mar-2001  bouyer Sync with HEAD.
 1.68.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.72.4.3 20-Apr-2004  jmc Pullup patch (requested by itojun in ticket #143)

If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it. Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window. If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
Respond to RST by ACK, as suggested in NISCC recommendation.
Rate-limit ACKs against RSTs and SYNs.
If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
 1.72.4.2 24-Jan-2002  he Pull up revision 1.88 (requested by itojun):
Clean up the NRL copyright.
 1.72.4.1 16-Aug-2000  itojun pullup (approved by releng-1-5)

switch from net.inet*.*.*ratelimit to net.inet*.*.ppslimit.

(tags are rough estimate - we had some try-and-error in main trunc)
sys/netinet/icmp6.h 1.9 -> 1.11
sys/netinet/icmp_var.h 1.15 -> 1.17
sys/netinet/in_proto.c 1.39 -> 1.42
sys/netinet/ip_icmp.c 1.50 -> 1.51, 1.52 -> 1.54
sys/netinet/tcp_input.c 1.111 -> 1.112, 1.115 -> 1.117
sys/netinet/tcp_usrreq.c 1.52 -> 1.53
sys/netinet/tcp_var.h 1.72 -> 1.75
sys/netinet6/icmp6.c 1.34 -> 1.35, 1.36 -> 1.38
sys/netinet6/in6_proto.c 1.17 -> 1.19
 1.77.2.9 11-Nov-2002  nathanw Catch up to -current
 1.77.2.8 01-Aug-2002  nathanw Catch up to -current.
 1.77.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.77.2.6 01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.77.2.5 28-Feb-2002  nathanw Catch up to -current.
 1.77.2.4 21-Sep-2001  nathanw Catch up to -current.
 1.77.2.3 24-Aug-2001  nathanw Catch up with -current.
 1.77.2.2 21-Jun-2001  nathanw Catch up to -current.
 1.77.2.1 09-Apr-2001  nathanw Catch up with -current.
 1.81.2.5 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.81.2.4 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.81.2.3 11-Feb-2002  jdolecek Sync w/ -current.
 1.81.2.2 13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.81.2.1 03-Aug-2001  lukem update to -current
 1.82.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.90.4.3 20-Apr-2004  jmc Pullup patch (requested by itojun in ticket #1680)

If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it. Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window. If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
Respond to RST by ACK, as suggested in NISCC recommendation.
Rate-limit ACKs against RSTs and SYNs.
If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
 1.90.4.2 22-Oct-2003  jmc Pullup rev 1.03 (requested by he in ticket #1530)


Introduce a new INVOKING status for callouts, and use it to close
a race condition in the TCP code. Fixes PR#20390.
 1.90.4.1 05-Sep-2003  tron Pull up revision 1.91 (requested by tls in ticket #1445):
path MTU discovery blackhole detection.
PR 12790 (sorry for not committing it for a long time)
 1.90.2.3 15-Jul-2002  gehenna catch up with -current.
 1.90.2.2 20-Jun-2002  gehenna catch up with -current.
 1.90.2.1 30-May-2002  gehenna Catch up with -current.
 1.102.2.12 11-Dec-2005  christos Sync with head.
 1.102.2.11 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.102.2.10 01-Apr-2005  skrll Sync with HEAD.
 1.102.2.9 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.102.2.8 07-Feb-2005  skrll Sunc with HEAD.
 1.102.2.7 04-Feb-2005  skrll Sync with HEAD.
 1.102.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.102.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.102.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.102.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.102.2.2 03-Aug-2004  skrll Sync with HEAD
 1.102.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.106.2.2 18-Sep-2004  he Pull up revision 1.113 (requested by yamt in ticket #861):
Fix ipqent pool corruption problems. Make the TCP reassembly
code use its own pool of ipqent rather than sharing it with
the IP reassembly code. Fixes PR#24782.
 1.106.2.1 20-Apr-2004  jmc Pullup patch (requested by itojun in ticket #169)

If a segment is received with RST set and the segment is completely to the
left of the receive window, ignore it. Add some additional comments to
the code that deals with received segemnts that are completely to the right
of the receive window. If an invalid SYN is received, force an ACK and
drop it; if the other side really sent the SYN; it'll respond with a reset.
Respond to RST by ACK, as suggested in NISCC recommendation.
Rate-limit ACKs against RSTs and SYNs.
If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
 1.115.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.115.4.1 12-Feb-2005  yamt sync with head.
 1.115.2.1 29-Apr-2005  kent sync with -current
 1.123.2.2 06-May-2005  tron Pull up revision 1.125 (requested by kurahone in ticket #199):
Added sysctl tunable limits for the number of maximum SACK holes
per connection and per system.
Idea taken from FreeBSD.
 1.123.2.1 04-Apr-2005  tron Pull up revision 1.124 (requested by yamt in ticket #90):
protect tcpipqent with splvm.
 1.126.2.6 17-Mar-2008  yamt sync with head.
 1.126.2.5 21-Jan-2008  yamt sync with head
 1.126.2.4 03-Sep-2007  yamt sync with head.
 1.126.2.3 26-Feb-2007  yamt sync with head.
 1.126.2.2 30-Dec-2006  yamt sync with head.
 1.126.2.1 21-Jun-2006  yamt sync with head.
 1.133.6.1 22-Apr-2006  simonb Sync with head.
 1.133.4.3 09-Sep-2006  rpaulo sync with head
 1.133.4.2 14-Mar-2006  rpaulo Remove in6pcb in parameter list.
 1.133.4.1 14-Mar-2006  rpaulo Remove back pointer to in6pcb.
 1.133.2.1 18-Feb-2006  yamt sync with head.
 1.134.2.2 14-Sep-2006  yamt sync with head.
 1.134.2.1 11-Aug-2006  yamt sync with head
 1.137.4.2 10-Dec-2006  yamt sync with head.
 1.137.4.1 22-Oct-2006  yamt sync with head
 1.137.2.2 12-Jan-2007  ad Sync with head.
 1.137.2.1 18-Nov-2006  ad Sync with head.
 1.141.4.1 03-Jun-2008  skrll Sync with netbsd-4.
 1.141.2.1 21-Jan-2008  bouyer Pull up following revision(s) (requested by ghen in ticket #1039):
sys/netinet/tcp_var.h: revision 1.148
distrib/sets/lists/comp/mi: revision 1.1035
distrib/sets/lists/man/mi: revision 1.1010
usr.sbin/tcpdrop/Makefile: revision 1.1
usr.sbin/tcpdrop/tcpdrop.c: revision 1.1 - 1.3
usr.sbin/tcpdrop/tcpdrop.8: revision 1.1
usr.sbin/Makefile: revision 1.228 via patch
sys/netinet/tcp_usrreq.c: revision 1.133
distrib/sets/lists/base/mi: revision 1.712
Import tcpdrop(8) from OpenBSD
 1.143.2.3 07-May-2007  yamt sync with head.
 1.143.2.2 12-Mar-2007  rmind Sync with HEAD.
 1.143.2.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.145.4.1 11-Jul-2007  mjf Sync with head.
 1.145.2.4 20-Aug-2007  ad Sync with HEAD.
 1.145.2.3 15-Jul-2007  ad Sync with head.
 1.145.2.2 01-Jul-2007  ad Adapt to callout API change.
 1.145.2.1 08-Jun-2007  ad Sync with head.
 1.149.2.1 15-Aug-2007  skrll Sync with HEAD.
 1.150.20.2 02-Aug-2007  rmind TCP socket buffers automatic sizing - ported from FreeBSD.
http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

! Disabled by default, marked as experimental. Testers are very needed.
! Someone should thoroughly test this, and improve if possible.

Discussed on <tech-net>:
http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html
Thanks Greg Troxel for comments.

OK by the long silence on <tech-net>.
 1.150.20.1 02-Aug-2007  rmind file tcp_var.h was added on branch matt-mips64 on 2007-08-02 02:42:43 +0000
 1.150.16.1 02-Jan-2008  bouyer Sync with HEAD
 1.150.12.1 26-Dec-2007  ad Sync with head.
 1.150.10.1 18-Feb-2008  mjf Sync with HEAD.
 1.150.4.2 23-Mar-2008  matt sync with HEAD
 1.150.4.1 09-Jan-2008  matt sync with HEAD
 1.151.6.3 28-Sep-2008  mjf Sync with HEAD.
 1.151.6.2 02-Jun-2008  mjf Sync with HEAD.
 1.151.6.1 03-Apr-2008  mjf Sync with HEAD.
 1.151.2.1 24-Mar-2008  keiichi sync with head.
 1.155.2.1 18-May-2008  yamt sync with head.
 1.156.2.5 11-Mar-2010  yamt sync with head
 1.156.2.4 16-Sep-2009  yamt sync with head
 1.156.2.3 20-Jun-2009  yamt sync with head
 1.156.2.2 04-May-2009  yamt sync with head.
 1.156.2.1 16-May-2008  yamt sync with head.
 1.157.6.1 19-Oct-2008  haad Sync with HEAD.
 1.157.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.158.10.1 21-Apr-2010  matt sync to netbsd-5
 1.158.4.1 26-Sep-2009  snj Pull up following revision(s) (requested by darran in ticket #950):
sys/netinet/tcp_input.c: revision 1.299
sys/netinet/tcp_usrreq.c: revision 1.156
sys/netinet/tcp_var.h: revision 1.161
Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl.
Okayed by tls@.
 1.158.2.1 03-Mar-2009  skrll Sync with HEAD.
 1.159.2.1 23-Jul-2009  jym Sync with HEAD.
 1.162.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.162.4.2 31-May-2011  rmind sync with head
 1.162.4.1 21-Apr-2011  rmind sync with head
 1.168.6.1 18-Feb-2012  mrg merge to -current.
 1.168.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.168.2.1 17-Apr-2012  yamt sync with head
 1.169.6.3 03-Dec-2017  jdolecek update from HEAD
 1.169.6.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.169.6.1 23-Jun-2013  tls resync from head
 1.170.4.3 18-May-2014  rmind sync with head
 1.170.4.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.170.4.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.172.2.1 10-Aug-2014  tls Rebase.
 1.175.4.2 28-Aug-2017  skrll Sync with HEAD
 1.175.4.1 06-Apr-2015  skrll Sync with HEAD
 1.175.2.1 21-Feb-2015  martin Pull up following revision(s) (requested by he in ticket #530):
sys/netinet/tcp_output.c: revision 1.180
sys/netinet/tcp_input.c: revision 1.336
sys/netinet/tcp_usrreq.c: revision 1.203
share/man/man4/tcp.4: revision 1.30
sys/netinet/tcp.h: revision 1.31
sys/netinet/tcp_subr.c: revision 1.258
sys/netinet/tcp_var.h: revision 1.176
sys/netinet/tcp_var.h: revision 1.177
sys/sys/param.h: bump revision

Port over the TCP_INFO socket option from FreeBSD, originally from
the Linux 2.6 TCP API. This permits the caller to query certain information
about a TCP connection, and is used by pkgsrc's net/iperf3 test program
if available.

This extends struct tcbcb with three fields to count retransmits,
out-of-sequence receives and zero window announcements, and will
therefore warrant a kernel revision bump (done separately).

Change the new counter variables in struct tcpcb to uint32_t, as
per christos' comments.
 1.177.10.2 03-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #514):
sys/net/route.c: 1.205
sys/net/rtsock.c: 1.237-1.238
sys/netinet/in.c: 1.215
sys/netinet/tcp_subr.c: 1.272
sys/netinet/tcp_timer.c: 1.93
sys/netinet/tcp_timer.h: 1.29
sys/netinet/tcp_var.h: 1.182
sys/netinet6/in6.c: 1.258
Remove extra pserialize_perform from in_purgeaddr
It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too).
Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr
The deadlock happened only if NET_MPSAFE on.
Run tcp_slowtimo in workqueue if NET_MPSAFE
If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as
much as possible to prevent any softint handlers including callout handlers
such as tcp_slowtimo from sticking on softnet_lock because it results in
undesired delays of executing subsequent softint handlers.
NFCI for !NET_MPSAFE
Fix a return value of rt_update_prepare
Callers expect it to be an errno.
Fix another deadlock
When waiting for a route update to finish, a waiter has to release its reference
to the route to avoid a deadlock. Because a updater tries to wait for references
to a target route (except for a reference by the updater itself) to be released.
 1.177.10.1 21-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #300):
crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19
crypto/dist/ipsec-tools/src/setkey/token.l: 1.20
distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759
doc/TODO.smpnet: 1.12-1.13
sys/net/pfkeyv2.h: 1.32
sys/net/raw_cb.c: 1.23-1.24, 1.28
sys/net/raw_cb.h: 1.28
sys/net/raw_usrreq.c: 1.57-1.58
sys/net/rtsock.c: 1.228-1.229
sys/netinet/in_proto.c: 1.125
sys/netinet/ip_input.c: 1.359-1.361
sys/netinet/tcp_input.c: 1.359-1.360
sys/netinet/tcp_output.c: 1.197
sys/netinet/tcp_var.h: 1.178
sys/netinet6/icmp6.c: 1.213
sys/netinet6/in6_proto.c: 1.119
sys/netinet6/ip6_forward.c: 1.88
sys/netinet6/ip6_input.c: 1.181-1.182
sys/netinet6/ip6_output.c: 1.193
sys/netinet6/ip6protosw.h: 1.26
sys/netipsec/ipsec.c: 1.100-1.122
sys/netipsec/ipsec.h: 1.51-1.61
sys/netipsec/ipsec6.h: 1.18-1.20
sys/netipsec/ipsec_input.c: 1.44-1.51
sys/netipsec/ipsec_netbsd.c: 1.41-1.45
sys/netipsec/ipsec_output.c: 1.49-1.64
sys/netipsec/ipsec_private.h: 1.5
sys/netipsec/key.c: 1.164-1.234
sys/netipsec/key.h: 1.20-1.32
sys/netipsec/key_debug.c: 1.18-1.21
sys/netipsec/key_debug.h: 1.9
sys/netipsec/keydb.h: 1.16-1.20
sys/netipsec/keysock.c: 1.59-1.62
sys/netipsec/keysock.h: 1.10
sys/netipsec/xform.h: 1.9-1.12
sys/netipsec/xform_ah.c: 1.55-1.74
sys/netipsec/xform_esp.c: 1.56-1.72
sys/netipsec/xform_ipcomp.c: 1.39-1.53
sys/netipsec/xform_ipip.c: 1.50-1.54
sys/netipsec/xform_tcp.c: 1.12-1.16
sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170
sys/rump/librump/rumpnet/net_stub.c: 1.27
sys/sys/protosw.h: 1.67-1.68
tests/net/carp/t_basic.sh: 1.7
tests/net/if_gif/t_gif.sh: 1.11
tests/net/if_l2tp/t_l2tp.sh: 1.3
tests/net/ipsec/Makefile: 1.7-1.9
tests/net/ipsec/algorithms.sh: 1.5
tests/net/ipsec/common.sh: 1.4-1.6
tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2
tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2
tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7
tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18
tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6
tests/net/ipsec/t_ipsec_tunnel.sh: 1.9
tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2
tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3
tests/net/mcast/t_mcast.sh: 1.6
tests/net/net/t_ipaddress.sh: 1.11
tests/net/net_common.sh: 1.20
tests/net/npf/t_npf.sh: 1.3
tests/net/route/t_flags.sh: 1.20
tests/net/route/t_flags6.sh: 1.16
usr.bin/netstat/fast_ipsec.c: 1.22
Do m_pullup before mtod

It may fix panicks of some tests on anita/sparc and anita/GuruPlug.
---
KNF
---
Enable DEBUG for babylon5
---
Apply C99-style struct initialization to xformsw
---
Tweak outputs of netstat -s for IPsec

- Get rid of "Fast"
- Use ipsec and ipsec6 for titles to clarify protocol
- Indent outputs of sub protocols

Original outputs were organized like this:

(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:
(Fast) IPsec:
IPsec ah:
IPsec esp:
IPsec ipip:
IPsec ipcomp:

New outputs are organized like this:

ipsec:
ah:
esp:
ipip:
ipcomp:
ipsec6:
ah:
esp:
ipip:
ipcomp:
---
Add test cases for IPComp
---
Simplify IPSEC_OSTAT macro (NFC)
---
KNF; replace leading whitespaces with hard tabs
---
Introduce and use SADB_SASTATE_USABLE_P
---
KNF
---
Add update command for testing

Updating an SA (SADB_UPDATE) requires that a process issuing
SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI).
This means that update command must be used with add command in a
configuration of setkey. This usage is normally meaningless but
useful for testing (and debugging) purposes.
---
Add test cases for updating SA/SP

The tests require newly-added udpate command of setkey.
---
PR/52346: Frank Kardel: Fix checksumming for NAT-T
See XXX for improvements.
---
Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE

It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters
that have IPsec accelerators; a driver sets the mtag to a packet
when its device has already encrypted the packet.

Unfortunately no driver implements such offload features for long
years and seems unlikely to implement them soon. (Note that neither
FreeBSD nor Linux doesn't have such drivers.) Let's remove related
(unused) codes and simplify the IPsec code.
---
Fix usages of sadb_msg_errno
---
Avoid updating sav directly

On SADB_UPDATE a target sav was updated directly, which was unsafe.
Instead allocate another sav, copy variables of the old sav to
the new one and replace the old one with the new one.
---
Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid
---
Rename key_alloc* functions (NFC)

We shouldn't use the term "alloc" for functions that just look up
data and actually don't allocate memory.
---
Use explicit_memset to surely zero-clear key_auth and key_enc
---
Make sure to clear keys on error paths of key_setsaval
---
Add missing KEY_FREESAV
---
Make sure a sav is inserted to a sah list after its initialization completes
---
Remove unnecessary zero-clearing codes from key_setsaval

key_setsaval is now used only for a newly-allocated sav. (It was
used to reset variables of an existing sav.)
---
Correct wrong assumption of sav->refcnt in key_delsah

A sav in a list is basically not to be sav->refcnt == 0. And also
KEY_FREESAV assumes sav->refcnt > 0.
---
Let key_getsavbyspi take a reference of a returning sav
---
Use time_mono_to_wall (NFC)
---
Separate sending message routine (NFC)
---
Simplify; remove unnecessary zero-clears

key_freesaval is used only when a target sav is being destroyed.
---
Omit NULL checks for sav->lft_c

sav->lft_c can be NULL only when initializing or destroying sav.
---
Omit unnecessary NULL checks for sav->sah
---
Omit unnecessary check of sav->state

key_allocsa_policy picks a sav of either MATURE or DYING so we
don't need to check its state again.
---
Simplify; omit unnecessary saidx passing

- ipsec_nextisr returns a saidx but no caller uses it
- key_checkrequest is passed a saidx but it can be gotton by
another argument (isr)
---
Fix splx isn't called on some error paths
---
Fix header size calculation of esp where sav is NULL
---
Fix header size calculation of ah in the case sav is NULL

This fix was also needed for esp.
---
Pass sav directly to opencrypto callback

In a callback, use a passed sav as-is by default and look up a sav
only if the passed sav is dead.
---
Avoid examining freshness of sav on packet processing

If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance,
we don't need to examine each sav and also don't need to delete one
on the fly and send up a message. Fortunately every sav lists are sorted
as we need.

Added key_validate_savlist validates that each sav list is surely sorted
(run only if DEBUG because it's not cheap).
---
Add test cases for SAs with different SPIs
---
Prepare to stop using isr->sav

isr is a shared resource and using isr->sav as a temporal storage
for each packet processing is racy. And also having a reference from
isr to sav makes the lifetime of sav non-deterministic; such a reference
is removed when a packet is processed and isr->sav is overwritten by
new one. Let's have a sav locally for each packet processing instead of
using shared isr->sav.

However this change doesn't stop using isr->sav yet because there are
some users of isr->sav. isr->sav will be removed after the users find
a way to not use isr->sav.
---
Fix wrong argument handling
---
fix printf format.
---
Don't validate sav lists of LARVAL or DEAD states

We don't sort the lists so the validation will always fail.

Fix PR kern/52405
---
Make sure to sort the list when changing the state by key_sa_chgstate
---
Rename key_allocsa_policy to key_lookup_sa_bysaidx
---
Separate test files
---
Calculate ah_max_authsize on initialization as well as esp_max_ivlen
---
Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag
---
Restore a comment removed in previous

The comment is valid for the below code.
---
Make tests more stable

sleep command seems to wait longer than expected on anita so
use polling to wait for a state change.
---
Add tests that explicitly delete SAs instead of waiting for expirations
---
Remove invalid M_AUTHIPDGM check on ESP isr->sav

M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can
have AH authentication as sav->tdb_authalgxform. However, in that
case esp_input and esp_input_cb are used to do ESP decryption and
AH authentication and M_AUTHIPDGM never be set to a mbuf. So
checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless.
---
Look up sav instead of relying on unstable sp->req->sav

This code is executed only in an error path so an additional lookup
doesn't matter.
---
Correct a comment
---
Don't release sav if calling crypto_dispatch again
---
Remove extra KEY_FREESAV from ipsec_process_done

It should be done by the caller.
---
Don't bother the case of crp->crp_buf == NULL in callbacks
---
Hold a reference to an SP during opencrypto processing

An SP has a list of isr (ipsecrequest) that represents a sequence
of IPsec encryption/authentication processing. One isr corresponds
to one opencrypto processing. The lifetime of an isr follows its SP.

We pass an isr to a callback function of opencrypto to continue
to a next encryption/authentication processing. However nobody
guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.

In order to avoid such unexpected destruction of isr, hold a reference
to its SP during opencrypto processing.
---
Don't make SAs expired on tests that delete SAs explicitly
---
Fix a debug message
---
Dedup error paths (NFC)
---
Use pool to allocate tdb_crypto

For ESP and AH, we need to allocate an extra variable space in addition
to struct tdb_crypto. The fixed size of pool items may be larger than
an actual requisite size of a buffer, but still the performance
improvement by replacing malloc with pool wins.
---
Don't use unstable isr->sav for header size calculations

We may need to optimize to not look up sav here for users that
don't need to know an exact size of headers (e.g., TCP segmemt size
caclulation).
---
Don't use sp->req->sav when handling NAT-T ESP fragmentation

In order to do this we need to look up a sav however an additional
look-up degrades performance. A sav is later looked up in
ipsec4_process_packet so delay the fragmentation check until then
to avoid an extra look-up.
---
Don't use key_lookup_sp that depends on unstable sp->req->sav

It provided a fast look-up of SP. We will provide an alternative
method in the future (after basic MP-ification finishes).
---
Stop setting isr->sav on looking up sav in key_checkrequest
---
Remove ipsecrequest#sav
---
Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore
---
Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu

Probably due to PR 43997
---
Add localcount to rump kernels
---
Remove unused macro
---
Fix key_getcomb_setlifetime

The fix adjusts a soft limit to be 80% of a corresponding hard limit.

I'm not sure the fix is really correct though, at least the original
code is wrong. A passed comb is zero-cleared before calling
key_getcomb_setlifetime, so
comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100;
is meaningless.
---
Provide and apply key_sp_refcnt (NFC)

It simplifies further changes.
---
Fix indentation

Pointed out by knakahara@
---
Use pslist(9) for sptree
---
Don't acquire global locks for IPsec if NET_MPSAFE

Note that the change is just to make testing easy and IPsec isn't MP-safe yet.
---
Let PF_KEY socks hold their own lock instead of softnet_lock

Operations on SAD and SPD are executed via PF_KEY socks. The operations
include deletions of SAs and SPs that will use synchronization mechanisms
such as pserialize_perform to wait for references to SAs and SPs to be
released. It is known that using such mechanisms with holding softnet_lock
causes a dead lock. We should avoid the situation.
---
Make IPsec SPD MP-safe

We use localcount(9), not psref(9), to make the sptree and secpolicy (SP)
entries MP-safe because SPs need to be referenced over opencrypto
processing that executes a callback in a different context.

SPs on sockets aren't managed by the sptree and can be destroyed in softint.
localcount_drain cannot be used in softint so we delay the destruction of
such SPs to a thread context. To do so, a list to manage such SPs is added
(key_socksplist) and key_timehandler_spd deletes dead SPs in the list.

For more details please read the locking notes in key.c.

Proposed on tech-kern@ and tech-net@
---
Fix updating ipsec_used

- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush
- key_update_used wasn't called if an SP had been added/deleted but
a reply to userland failed
---
Fix updating ipsec_used; turn on when SPs on sockets are added
---
Add missing IPsec policy checks to icmp6_rip6_input

icmp6_rip6_input is quite similar to rip6_input and the same checks exist
in rip6_input.
---
Add test cases for setsockopt(IP_IPSEC_POLICY)
---
Don't use KEY_NEWSP for dummy SP entries

By the change KEY_NEWSP is now not called from softint anymore
and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP.
---
Comment out unused functions
---
Add test cases that there are SPs but no relevant SAs
---
Don't allow sav->lft_c to be NULL

lft_c of an sav that was created by SADB_GETSPI could be NULL.
---
Clean up clunky eval strings

- Remove unnecessary \ at EOL
- This allows to omit ; too
- Remove unnecessary quotes for arguments of atf_set
- Don't expand $DEBUG in eval
- We expect it's expanded on execution

Suggested by kre@
---
Remove unnecessary KEY_FREESAV in an error path

sav should be freed (unreferenced) by the caller.
---
Use pslist(9) for sahtree
---
Use pslist(9) for sah->savtree
---
Rename local variable newsah to sah

It may not be new.
---
MP-ify SAD slightly

- Introduce key_sa_mtx and use it for some list operations
- Use pserialize for some list iterations
---
Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future

KEY_SA_UNREF is still key_freesav so no functional change for now.

This change reduces diff of further changes.
---
Remove out-of-date log output

Pointed out by riastradh@
---
Use KDASSERT instead of KASSERT for mutex_ownable

Because mutex_ownable is too heavy to run in a fast path
even for DIAGNOSTIC + LOCKDEBUG.

Suggested by riastradh@
---
Assemble global lists and related locks into cache lines (NFCI)

Also rename variable names from *tree to *list because they are
just lists, not trees.

Suggested by riastradh@
---
Move locking notes
---
Update the locking notes

- Add locking order
- Add locking notes for misc lists such as reglist
- Mention pserialize, key_sp_ref and key_sp_unref on SP operations

Requested by riastradh@
---
Describe constraints of key_sp_ref and key_sp_unref

Requested by riastradh@
---
Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL
---
Add __read_mostly to key_psz

Suggested by riastradh@
---
Tweak wording (pserialize critical section => pserialize read section)

Suggested by riastradh@
---
Add missing mutex_exit
---
Fix setkey -D -P outputs

The outputs were tweaked (by me), but I forgot updating libipsec
in my local ATF environment...
---
MP-ify SAD (key_sad.sahlist and sah entries)

localcount(9) is used to protect key_sad.sahlist and sah entries
as well as SPD (and will be used for SAD sav).

Please read the locking notes of SAD for more details.
---
Introduce key_sa_refcnt and replace sav->refcnt with it (NFC)
---
Destroy sav only in the loop for DEAD sav
---
Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf

If key_sendup_mbuf isn't passed a socket, the assertion fails.
Originally in this case sb->sb_so was softnet_lock and callers
held softnet_lock so the assertion was magically satisfied.
Now sb->sb_so is key_so_mtx and also softnet_lock isn't always
held by callers so the assertion can fail.

Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.

Reported by knakahara@
Tested by knakahara@ and ozaki-r@
---
Fix locking notes of SAD
---
Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain

If we call key_sendup_mbuf from key_acquire that is called on packet
processing, a deadlock can happen like this:
- At key_acquire, a reference to an SP (and an SA) is held
- key_sendup_mbuf will try to take key_so_mtx
- Some other thread may try to localcount_drain to the SP with
holding key_so_mtx in say key_api_spdflush
- In this case localcount_drain never return because key_sendup_mbuf
that has stuck on key_so_mtx never release a reference to the SP

Fix the deadlock by deferring key_sendup_mbuf to the timer
(key_timehandler).
---
Fix that prev isn't cleared on retry
---
Limit the number of mbufs queued for deferred key_sendup_mbuf

It's easy to be queued hundreds of mbufs on the list under heavy
network load.
---
MP-ify SAD (savlist)

localcount(9) is used to protect savlist of sah. The basic design is
similar to MP-ifications of SPD and SAD sahlist. Please read the
locking notes of SAD for more details.
---
Simplify ipsec_reinject_ipstack (NFC)
---
Add per-CPU rtcache to ipsec_reinject_ipstack

It reduces route lookups and also reduces rtcache lock contentions
when NET_MPSAFE is enabled.
---
Use pool_cache(9) instead of pool(9) for tdb_crypto objects

The change improves network throughput especially on multi-core systems.
---
Update

ipsec(4), opencrypto(9) and vlan(4) are now MP-safe.
---
Write known issues on scalability
---
Share a global dummy SP between PCBs

It's never be changed so it can be pre-allocated and shared safely between PCBs.
---
Fix race condition on the rawcb list shared by rtsock and keysock

keysock now protects itself by its own mutex, which means that
the rawcb list is protected by two different mutexes (keysock's one
and softnet_lock for rtsock), of course it's useless.

Fix the situation by having a discrete rawcb list for each.
---
Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE
---
fix localcount leak in sav. fixed by ozaki-r@n.o.

I commit on behalf of him.
---
remove unnecessary comment.
---
Fix deadlock between pserialize_perform and localcount_drain

A typical ussage of localcount_drain looks like this:

mutex_enter(&mtx);
item = remove_from_list();
pserialize_perform(psz);
localcount_drain(&item->localcount, &cv, &mtx);
mutex_exit(&mtx);

This sequence can cause a deadlock which happens for example on the following
situation:

- Thread A calls localcount_drain which calls xc_broadcast after releasing
a specified mutex
- Thread B enters the sequence and calls pserialize_perform with holding
the mutex while pserialize_perform also calls xc_broadcast
- Thread C (xc_thread) that calls an xcall callback of localcount_drain tries
to hold the mutex

xc_broadcast of thread B doesn't start until xc_broadcast of thread A
finishes, which is a feature of xcall(9). This means that pserialize_perform
never complete until xc_broadcast of thread A finishes. On the other hand,
thread C that is a callee of xc_broadcast of thread A sticks on the mutex.
Finally the threads block each other (A blocks B, B blocks C and C blocks A).

A possible fix is to serialize executions of the above sequence by another
mutex, but adding another mutex makes the code complex, so fix the deadlock
by another way; the fix is to release the mutex before pserialize_perform
and instead use a condvar to prevent pserialize_perform from being called
simultaneously.

Note that the deadlock has happened only if NET_MPSAFE is enabled.
---
Add missing ifdef NET_MPSAFE
---
Take softnet_lock on pr_input properly if NET_MPSAFE

Currently softnet_lock is taken unnecessarily in some cases, e.g.,
icmp_input and encap4_input from ip_input, or not taken even if needed,
e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.

NFC if NET_MPSAFE is disabled (default).
---
- sanitize key debugging so that we don't print extra newlines or unassociated
debugging messages.
- remove unused functions and make internal ones static
- print information in one line per message
---
humanize printing of ip addresses
---
cast reduction, NFC.
---
Fix typo in comment
---
Pull out ipsec_fill_saidx_bymbuf (NFC)
---
Don't abuse key_checkrequest just for looking up sav

It does more than expected for example key_acquire.
---
Fix SP is broken on transport mode

isr->saidx was modified accidentally in ipsec_nextisr.

Reported by christos@
Helped investigations by christos@ and knakahara@
---
Constify isr at many places (NFC)
---
Include socketvar.h for softnet_lock
---
Fix buffer length for ipsec_logsastr
 1.184.2.5 18-Jan-2019  pgoyette Synch with HEAD
 1.184.2.4 30-Sep-2018  pgoyette Ssync with HEAD
 1.184.2.3 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.184.2.2 02-May-2018  pgoyette Synch with HEAD
 1.184.2.1 30-Mar-2018  pgoyette Resolve conflicts between branch and HEAD
 1.186.2.1 10-Jun-2019  christos Sync with HEAD
 1.192.4.1 03-Apr-2021  thorpej Sync with HEAD.
 1.195.4.1 01-Aug-2021  thorpej Sync with HEAD.
 1.198.8.1 02-Aug-2025  perseant Sync with HEAD
 1.25 07-Oct-2024  jakllsch Allow CACHE_LINE_SIZE 256 with uint64_t fatp_word_t
 1.24 04-Nov-2022  ozaki-r branches: 1.24.8;
inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.23 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.22 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.21 13-Aug-2021  andvar fix typos in words "pointer" and s/fram /frame/
 1.20 01-Oct-2019  chs in many device attach paths, allocate memory with KM_SLEEP instead of KM_NOSLEEP
and remove code to handle failures that can no longer happen.
 1.19 03-May-2018  maxv branches: 1.19.2;
Remove now unused tcpip.h includes. Some were already unused before.
 1.18 01-Jun-2017  chs branches: 1.18.8;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.17 13-Dec-2016  ozaki-r Remove unnecessary inclusions of nd6.h
 1.16 28-Jul-2016  martin PR kern/51371: avoid shifting negative values
 1.15 26-Apr-2016  ozaki-r branches: 1.15.2;
Sweep unnecessary route.h inclusions
 1.14 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.13 31-Mar-2015  ozaki-r Remove unnecessary opt_ipsec.h inclusions
 1.12 10-Nov-2014  maxv branches: 1.12.2;
Do not uselessly include <sys/malloc.h>.
 1.11 05-Sep-2014  matt Don't use C++ keywords (class, template) as variables
 1.10 15-Sep-2013  martin branches: 1.10.4;
ifdef a variable like its use
 1.9 13-Apr-2012  yamt branches: 1.9.2; 1.9.4;
add a big comment
(copy and paste from cvs log rev.1.1)
 1.8 17-Jul-2011  joerg branches: 1.8.2; 1.8.6;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.7 06-Jun-2011  dyoung Don't allocate resources for vtw until/unless it is enabled. This will
further help those machines where memory is in short supply.

TBD: release resources after vtw is disabled and all entries have
expired.
 1.6 03-Jun-2011  dyoung branches: 1.6.2;
Don't sleep until memory becomes available.

Use kmem_zalloc() instead of kmem_alloc() + bzero().

During initialization, try to get all of the memory we need for the
vestigial time-wait structures before we set any of the structures up,
and if any single allocation fails, release all of the memory.

This should help low-memory hosts. A much better fix postpones
allocating any memory until vtw is enabled through the sysctl.
 1.5 03-Jun-2011  dyoung Defer scheduling vtw_tick() and setting the vtw hooks until
vtw_control() is called. In this way, vtw_tick() will be re-scheduled
repeatedly while vtw is in use.

Pay tcp_vtw_was_enabled no attention in vtw_earlyinit(), since it's
always going to be 0 during initialization.
 1.4 17-May-2011  dholland branches: 1.4.2; 1.4.4;
typo in comment
 1.3 11-May-2011  drochner use getmicrouptime(9) rather than microtime(9) for TIME_WAIT duration
calculation, because this doesn't get confused by system time changes,
and uses less CPU cycles
reviewed by dyoung
 1.2 06-May-2011  drochner remove an empty function
 1.1 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.4.4.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.4.2.3 12-Jun-2011  rmind sync with head
 1.4.2.2 31-May-2011  rmind sync with head
 1.4.2.1 17-May-2011  rmind file tcp_vtw.c was added on branch rmind-uvmplock on 2011-05-31 03:05:08 +0000
 1.6.2.2 06-Jun-2011  jruoho Sync with HEAD.
 1.6.2.1 03-Jun-2011  jruoho file tcp_vtw.c was added on branch jruoho-x86intr on 2011-06-06 09:09:57 +0000
 1.8.6.1 29-Apr-2012  mrg sync to latest -current.
 1.8.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.2.1 17-Apr-2012  yamt sync with head
 1.9.4.3 18-May-2014  rmind sync with head
 1.9.4.2 23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.9.4.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.9.2.2 03-Dec-2017  jdolecek update from HEAD
 1.9.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.10.4.1 17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.12.2.6 28-Aug-2017  skrll Sync with HEAD
 1.12.2.5 05-Feb-2017  skrll Sync with HEAD
 1.12.2.4 05-Oct-2016  skrll Sync with HEAD
 1.12.2.3 29-May-2016  skrll Sync with HEAD
 1.12.2.2 22-Sep-2015  skrll Sync with HEAD
 1.12.2.1 06-Apr-2015  skrll Sync with HEAD
 1.15.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.15.2.1 06-Aug-2016  pgoyette Sync with HEAD
 1.18.8.1 21-May-2018  pgoyette Sync with HEAD
 1.19.2.1 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.24.8.1 02-Aug-2025  perseant Sync with HEAD
 1.11 07-Oct-2024  jakllsch Allow CACHE_LINE_SIZE 256 with uint64_t fatp_word_t
 1.10 11-Dec-2022  mlelstv branches: 1.10.8;
Need larger fat pointers for 128bit cache lines.
 1.9 19-Apr-2018  christos s/static inline/static __inline/g for consistency.
 1.8 13-Dec-2016  ozaki-r branches: 1.8.14;
Remove unnecessary inclusions of nd6.h
 1.7 26-Apr-2016  ozaki-r branches: 1.7.2;
Sweep unnecessary route.h inclusions
 1.6 23-Nov-2012  joerg branches: 1.6.2; 1.6.14;
Add RCS keyword. Avoid overflow in constant.
 1.5 07-Jun-2011  joerg branches: 1.5.2; 1.5.12;
Be a bit cleaner and reduce the amount of namespace pollution
 1.4 06-Jun-2011  dyoung Don't allocate resources for vtw until/unless it is enabled. This will
further help those machines where memory is in short supply.

TBD: release resources after vtw is disabled and all entries have
expired.
 1.3 17-May-2011  dholland branches: 1.3.2; 1.3.4; 1.3.6;
typo in comment
 1.2 03-May-2011  dyoung Remove #ifdef INET6 throughout.
 1.1 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.3.6.2 06-Jun-2011  jruoho Sync with HEAD.
 1.3.6.1 17-May-2011  jruoho file tcp_vtw.h was added on branch jruoho-x86intr on 2011-06-06 09:09:57 +0000
 1.3.4.1 23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.3.2.3 12-Jun-2011  rmind sync with head
 1.3.2.2 31-May-2011  rmind sync with head
 1.3.2.1 17-May-2011  rmind file tcp_vtw.h was added on branch rmind-uvmplock on 2011-05-31 03:05:08 +0000
 1.5.12.2 03-Dec-2017  jdolecek update from HEAD
 1.5.12.1 25-Feb-2013  tls resync with head
 1.5.2.1 16-Jan-2013  yamt sync with (a bit old) head
 1.6.14.2 05-Feb-2017  skrll Sync with HEAD
 1.6.14.1 29-May-2016  skrll Sync with HEAD
 1.6.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.7.2.1 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.8.14.1 22-Apr-2018  pgoyette Sync with HEAD
 1.10.8.1 02-Aug-2025  perseant Sync with HEAD
 1.12 29-Apr-2018  maxv Move struct tcpiphdr from tcpip.h to tcp_var.h, to match UDP (udpiphdr in
udp_var.h).

tcpip.h is now empty, and can be removed.
 1.11 25-Dec-2007  perry branches: 1.11.96;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.10 10-Dec-2005  elad branches: 1.10.46; 1.10.52; 1.10.56; 1.10.60;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.9 07-Aug-2003  agc branches: 1.9.16;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.8 20-Nov-1999  thorpej branches: 1.8.28;
Add the `packed' attribute to structures which describe wire protocol data.
 1.7 10-Feb-1998  perry branches: 1.7.14; 1.7.20;
add/cleanup multiple inclusion protection.
 1.6 21-Nov-1995  cgd make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.
 1.5 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.7.20.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.7.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.28.4 11-Dec-2005  christos Sync with head.
 1.8.28.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.8.28.2 18-Sep-2004  skrll Sync with HEAD.
 1.8.28.1 03-Aug-2004  skrll Sync with HEAD
 1.9.16.2 21-Jan-2008  yamt sync with head
 1.9.16.1 21-Jun-2006  yamt sync with head.
 1.10.60.1 02-Jan-2008  bouyer Sync with HEAD
 1.10.56.1 26-Dec-2007  ad Sync with head.
 1.10.52.1 18-Feb-2008  mjf Sync with HEAD.
 1.10.46.1 09-Jan-2008  matt sync with HEAD
 1.11.96.1 02-May-2018  pgoyette Synch with HEAD
 1.19 03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.18 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.17 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.16 22-Jun-2012  christos branches: 1.16.52;
PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.15 07-Jan-2012  christos u_intN -> uintN
make standalone
 1.14 24-Sep-2011  christos branches: 1.14.2; 1.14.6;
Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.13 25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.12 10-Dec-2005  elad branches: 1.12.46; 1.12.52; 1.12.56; 1.12.60;
Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.11 12-Feb-2005  manu branches: 1.11.6;
Add support for IPsec Network Address Translator traversal (NAT-T), as
described by RFC 3947 and 3948.
 1.10 07-Aug-2003  agc branches: 1.10.8; 1.10.10;
Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.9 20-Nov-1999  thorpej branches: 1.9.28;
Add the `packed' attribute to structures which describe wire protocol data.
 1.8 10-Feb-1998  perry branches: 1.8.12; 1.8.14; 1.8.20;
add/cleanup multiple inclusion protection.
 1.7 25-Oct-1996  thorpej Make length and offset fields unsigned. From Kevin M. Lahey <kml@nas.nasa.gov>
 1.6 13-Apr-1995  cgd branches: 1.6.6;
be a bit more careful and explicit with types. (basically a large no-op.)
 1.5 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.4 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.6.6.1 10-Nov-1996  thorpej Update from trunk:
- Make ip_len and ip_off unsigned.
- Make sure we don't accept or transmit packets larger than the
maximim IP packet size.
This fixes the so-called `death ping' bug.

Sum of work from Bill Fenner <fenner@parc.xerox.com>,
Kevin Lahey <kml@nas.nasa.gov>, and myself.

Thanks to Curt Sampson, Jukka Marin, and Kevin Lahey for testing
this under NetBSD 1.2
 1.8.20.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.8.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.8.12.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.8.12.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.8.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.9.28.5 11-Dec-2005  christos Sync with head.
 1.9.28.4 15-Feb-2005  skrll Sync with HEAD.
 1.9.28.3 21-Sep-2004  skrll Fix the sync with head I botched.
 1.9.28.2 18-Sep-2004  skrll Sync with HEAD.
 1.9.28.1 03-Aug-2004  skrll Sync with HEAD
 1.10.10.1 12-Feb-2005  yamt sync with head.
 1.10.8.1 29-Apr-2005  kent sync with -current
 1.11.6.2 21-Jan-2008  yamt sync with head
 1.11.6.1 21-Jun-2006  yamt sync with head.
 1.12.60.1 02-Jan-2008  bouyer Sync with HEAD
 1.12.56.1 26-Dec-2007  ad Sync with head.
 1.12.52.1 18-Feb-2008  mjf Sync with HEAD.
 1.12.46.1 09-Jan-2008  matt sync with HEAD
 1.14.6.1 18-Feb-2012  mrg merge to -current.
 1.14.2.2 30-Oct-2012  yamt sync with head
 1.14.2.1 17-Apr-2012  yamt sync with head
 1.16.52.1 03-Apr-2021  thorpej Sync with HEAD.
 1.6 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.5 17-Feb-2021  christos - pass the alignment instead of the mask (as Roy asked and to match the
other macro)
- use alignof to determine that alignment and CTASSERT what we expect
- remove unused macros
 1.4 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.3 28-Apr-2008  martin branches: 1.3.4; 1.3.102;
Remove clause 3 and 4 from TNF licenses
 1.2 23-Apr-2008  thorpej branches: 1.2.2;
Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.1 12-Apr-2008  thorpej branches: 1.1.2;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.1.2.1 18-May-2008  yamt sync with head.
 1.2.2.1 16-May-2008  yamt sync with head.
 1.3.102.1 03-Apr-2021  thorpej Sync with HEAD.
 1.3.4.2 02-Jun-2008  mjf Sync with HEAD.
 1.3.4.1 28-Apr-2008  mjf file udp_private.h was added on branch mjf-devfs2 on 2008-06-02 13:24:25 +0000
 1.266 08-Oct-2024  riastradh udp(4): Clarify udp4/6_espinudp and inp_overudp_cb return.

Cleanup to detect problems like this earlier:

PR kern/58688: userland panic of kernel via wg(4)
 1.265 05-Jul-2024  rin sys: Drop redundant NULL check before m_freem(9)

m_freem(9) safely has accepted NULL argument at least since 4.2BSD:
https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c

Compile-tested on amd64/ALL.

Suggested by knakahara@
 1.264 04-Nov-2022  ozaki-r branches: 1.264.8;
inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.263 28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.262 28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.261 19-Feb-2021  christos - Make ALIGNED_POINTER use __alignof(t) instead of sizeof(t). This is more
correct because it works with non-primitive types and provides the ABI
alignment for the type the compiler will use.
- Remove all the *_HDR_ALIGNMENT macros and asserts
- Replace POINTER_ALIGNED_P with ACCESSIBLE_POINTER which is identical to
ALIGNED_POINTER, but returns that the pointer is always aligned if the
CPU supports unaligned accesses.
[ as proposed in tech-kern ]
 1.260 14-Feb-2021  christos - centralize header align and pullup into a single inline function
- use a single macro to align pointers and expose the alignment, instead
of hard-coding 3 in 1/2 the macros.
- fix an issue in the ipv6 lt2p where it was aligning for ipv4 and pulling
for ipv6.
 1.259 20-Aug-2020  riastradh branches: 1.259.2;
[ozaki-r] Changes to the kernel core for wireguard
 1.258 27-Dec-2018  maxv Remove unused arguments.
 1.257 22-Nov-2018  knakahara Support IPv6 NAT-T. Implemented by hsuenaga@IIJ and ohishi@IIJ.

Add ATF later.
 1.256 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.255 15-Jul-2018  maxv Retire ipkdb entirely. The option was removed from the config files
yesterday.

ok kamil christos
 1.254 31-May-2018  maxv branches: 1.254.2;
Remove the non-IKE part of the computation, too.
 1.253 31-May-2018  maxv Remove support for non-IKE markers in the kernel. Discussed on tech-net@,
and now in PR/53334. Basically non-IKE markers come from a deprecated
draft, and our kernel code for them has never worked.

Setsockopt will now reject UDP_ENCAP_ESPINUDP_NON_IKE.

Perhaps we should also add a check in key_handle_natt_info(), to make
sure we also reject UDP_ENCAP_ESPINUDP_NON_IKE in the SADB.
 1.252 18-May-2018  maxv IP6_EXTHDR_GET -> M_REGION_GET, no functional change.
 1.251 13-May-2018  maxv Clarify ESP-in-UDP.
 1.250 01-May-2018  maxv Remove unused argument from udp4_espinudp, and remove unused includes.
 1.249 28-Apr-2018  maxv Remove unused ipsec_var.h includes.
 1.248 13-Apr-2018  maxv Improve the check, we want to have len >= udphdr all the time, and not
just when the packet size doesn't match the mbuf size.

Normally that's not a huge problem, since IP6_EXTHDR_GET gets called
earlier, so we can't have

(ip_len == iphlen + len) && (len < sizeof(struct udphdr))
 1.247 12-Apr-2018  maxv Remove misleading comment; we're just checking the SP, not verifying the
AH/ESP payload. While here style a bit.
 1.246 19-Mar-2018  roy socket: report receive buffer overflows

Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().

This allows userland to detect route(4) overflows so it can re-sync
with the current state.
 1.245 28-Feb-2018  maxv branches: 1.245.2;
Remove unused ipsec_private.h includes.
 1.244 28-Feb-2018  maxv Remove duplicate IPSEC_STATINC(IPSEC_STAT_IN_POLVIO), ipsec_in_reject
already increases it. IPSEC6_STATINC is now unused, so remove it too.
 1.243 26-Feb-2018  maxv Dedup: merge ipsec4_in_reject and ipsec6_in_reject into ipsec_in_reject.
While here fix misleading comment.

ok ozaki-r@
 1.242 14-Feb-2018  maxv Revert my two last changes in this file. They are apparently causing
problems with racoon, I'll investigate this later.
 1.241 12-Feb-2018  maxv Don't rebase the pointers. 'm' is only allowed to become NULL (which
means 'processed').
 1.240 10-Feb-2018  maxv If the socket wants a ESP-over-UDP packet, and the packet is incorrect,
stop processing it instead of giving it to udp4_sendup. It just doesn't
make any sense not to drop it.

I was already telling myself this the other day when I visited this place,
but I just saw PR/36782 (11 years old) that suggests the exact same thing,
so fix it.

Now, udp4_espinudp always frees the mbuf, and is made void. The packet is
not processed any further afterwards.
 1.239 08-Feb-2018  maxv More style, no functional change.
 1.238 08-Feb-2018  maxv Style, and remove printfs.
 1.237 08-Feb-2018  maxv Fix three pretty bad mistakes in NAT-T:

* If we got a keepalive packet, we need to call m_freem, not m_free.
Here the next mbufs in the chain are not freed. Seems easy to remotely
DoS the system by sending fragmented keepalives in a loop.

* If !ipsec_used, free the mbuf.

* In udp_input, we need to update 'uh', because udp4_realinput may have
modified the chain. Perhaps we also need to re-enforce alignment, so
add an XXX.
 1.236 11-Dec-2017  ryo As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.235 10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.234 06-Jul-2017  christos Merge the two copies SO_TIMESTAMP/SO_OTIMESTAMP processing to a single
function, and add a SOOPT_TIMESTAMP define reducing compat pollution from
5 places to 1.
 1.233 20-Apr-2017  ozaki-r branches: 1.233.4;
Remove unnecessary NULL checks for inp_socket and in6p_socket

They cannot be NULL except for programming errors.
 1.232 20-Apr-2017  ozaki-r Simplify logic of udp4_sendup and udp6_sendup

They are always passed a socket with the same protocol faimiliy
as its own: AF_INET for udp4_sendup and AF_INET6 for udp6_sendup.
 1.231 03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.230 24-Jan-2017  ozaki-r Tweak softnet_lock and NET_MPSAFE

- Don't hold softnet_lock in some functions if NET_MPSAFE
- Add softnet_lock to sysctl_net_inet_icmp_redirtimeout
- Add softnet_lock to expire_upcalls of ip_mroute.c
- Restore softnet_lock for in{,6}_pcbpurgeif{,0} if NET_MPSAFE
- Mark some softnet_lock for future work
 1.229 18-Nov-2016  knakahara branches: 1.229.2;
fix: "ifconfig destory" can stalls when "ifconfig" is done parallel.
This problem occurs only if NET_MPSAFE on.

ifconfig destroy side:
kernel entry point is ifioctl => if_clone_destroy.
pr_purgeif() acquires softnet_lock, and then ifa_remove() calls
pserialize_perform() holding softnet_lock.
ifconfig side:
kernel entry point is socreate.
pr_attach()(udp_attach_wrapper()) calls sosetlock(). In this call path,
sosetlock() try to acquire softnet_lock.
These can cause dead lock.
 1.228 15-Nov-2016  mlelstv Enforce alignment requirements that are violated in some cases.
For machines that don't need strict alignment (i386,amd64,vax,m68k) this
is a no-op.

Fixes PR kern/50766 but should be improved.
 1.227 19-Oct-2016  ozaki-r Remove unnecessary #ifdef IPSEC

The entire function is already in #ifdef IPSEC.

No functional change.
 1.226 10-Jun-2016  ozaki-r branches: 1.226.2;
Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.225 26-Apr-2016  ozaki-r Sweep unnecessary route.h inclusions
 1.224 15-Feb-2016  rtr Reduce code duplication.

Split creation of IPv4-Mapped IPv6 addresses into its own function
and use it.

No functional change intended. As posted to tech-net@
 1.223 20-Jan-2016  riastradh Give proper prototype to udp_output.
 1.222 24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.221 02-May-2015  rtr make connect syscall use sockaddr_big and modify pr_{send,connect}
nam parameter type from buf * to sockaddr *.

final commit for parameter type changes to protocol user requests

* bump kernel version to 7.99.15 for parameter type changes to pr_{send,connect}
 1.220 26-Apr-2015  rtr remove pr_generic from struct pr_usrreqs and all implementations of
pr_generic in protocols.

bump to 7.99.13

approved by rmind@
 1.219 24-Apr-2015  rtr make accept, getsockname and getpeername syscalls use sockaddr_big and modify
pr_{accept,sockname,peername} nam parameter type from mbuf * to sockaddr *.

* retained use of mbuftypes[MT_SONAME] for now.
* bump to netbsd version 7.99.12 for parameter type change.

patch posted to tech-net@ 2015/04/19
 1.218 03-Apr-2015  rtr * change pr_bind to accept struct sockaddr * instead of struct mbuf *
* update protocol bind implementations to use/expect sockaddr *
instead of mbuf *
* introduce sockaddr_big struct for storage of addr data passed via
sys_bind; sockaddr_big is of sufficient size and alignment to
accommodate all addr data sizes received.
* modify sys_bind to allocate sockaddr_big instead of using an mbuf.
* bump kernel version to 7.99.9 for change to pr_bind() parameter type.

Patch posted to tech-net@
http://mail-index.netbsd.org/tech-net/2015/03/15/msg005004.html

The choice to use a new structure sockaddr_big has been retained since
changing sockaddr_storage size would lead to unnecessary ABI change. The
use of the new structure does not preclude future work that increases
the size of sockaddr_storage and at that time sockaddr_big may be
trivially replaced.

Tested by mrg@ and myself, discussed with rmind@, posted to tech-net@
 1.217 09-Aug-2014  rtr branches: 1.217.2; 1.217.4; 1.217.6; 1.217.10;
split PRU_CONNECT2 & PRU_PURGEIF function out of pr_generic() usrreq
switches and put into separate functions

- always KASSERT(solocked(so)) even if not implemented
(for PRU_CONNECT2 only)

- replace calls to pr_generic() with req = PRU_CONNECT2 with calls to
pr_connect2()

- replace calls to pr_generic() with req = PRU_PURGEIF with calls to
pr_purgeif()

put common code from unp_connect2() (used by unp_connect() into
unp_connect1() and call out to it when needed

patch only briefly reviewed by rmind@
 1.216 08-Aug-2014  rtr split PRU_RCVD function out of pr_generic() usrreq switches and put into
separate functions

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_RCVD with calls to
pr_rcvd()
 1.215 05-Aug-2014  rtr split PRU_SEND function out of pr_generic() usrreq switches and put into
separate functions

xxx_send(struct socket *, struct mbuf *, struct mbuf *,
struct mbuf *, struct lwp *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_SEND with calls to
pr_send()

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_send() PRUs

- l2cap_send() -> l2cap_send_pcb()
- sco_send() -> sco_send_pcb()
- rfcomm_send() -> rfcomm_send_pcb()

patch reviewed by rmind
 1.214 05-Aug-2014  rtr revert the removal of struct lwp * parameter from bind, listen and connect
user requests.

this should resolve the issue relating to nfs client hangs presented
recently by wiz on current-users@
 1.213 02-Aug-2014  rtr restore splsoftnet() in various usrreqs that were removed during the PRU
splits. we will properly review removal after the PRU split work is
complete.
 1.212 31-Jul-2014  rtr split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of
pr_generic() usrreq switches and put into separate functions

xxx_disconnect(struct socket *)
xxx_shutdown(struct socket *)
xxx_abort(struct socket *)

- always KASSERT(solocked(so)) even if not implemented
- replace calls to pr_generic() with req =
PRU_{DISCONNECT,SHUTDOWN,ABORT}
with calls to pr_{disconnect,shutdown,abort}() respectively

rename existing internal functions used to implement above functionality
to permit use of the names for xxx_{disconnect,shutdown,abort}().

- {l2cap,sco,rfcomm}_disconnect() ->
{l2cap,sco,rfcomm}_disconnect_pcb()
- {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1()
- unp_shutdown() -> unp_shutdown1()

patch reviewed by rmind
 1.211 30-Jul-2014  rtr split PRU_CONNECT function out of pr_generic() usrreq switches and put
into seaparate functions

xxx_listen(struct socket *, struct mbuf *)

- always KASSERT(solocked(so)) and KASSERT(nam != NULL)
- replace calls to pr_generic() with req = PRU_CONNECT with
pr_connect()
- rename existin {l2cap,sco,rfcomm}_connect() to
{l2cap,sco,rfcomm}_connect_pcb() respectively to permit
naming consistency with other protocols functions.
- drop struct lwp * parameter from unp_connect() and at_pcbconnect()
and use curlwp instead where appropriate.

patch reviewed by rmind
 1.210 24-Jul-2014  rtr split PRU_BIND and PRU_LISTEN function out of pr_generic() usrreq
switches and put into separate functions
xxx_bind(struct socket *, struct mbuf *)
xxx_listen(struct socket *)

- always KASSERT(solocked(so)) even if not implemented

- replace calls to pr_generic() with req = PRU_BIND with call to
pr_bind()

- replace calls to pr_generic() with req = PRU_LISTEN with call to
pr_listen()

- drop struct lwp * parameter from at_pcbsetaddr(), in_pcbbind() and
unp_bind() and always use curlwp.

rename existing functions that operate on PCB for consistency (and to
free up their names for xxx_{bind,listen}() PRUs

- l2cap_{bind,listen}() -> l2cap_{bind,listen}_pcb()
- sco_{bind,listen}() -> sco_{bind,listen}_pcb()
- rfcomm_{bind,listen}() -> rfcomm_{bind,listen}_pcb()

patch reviewed by rmind

welcome to netbsd 6.99.48
 1.209 23-Jul-2014  rtr split PRU_SENDOOB and PRU_RCVOOB function out of pr_generic() usrreq
switches and put into separate functions
xxx_sendoob(struct socket *, struct mbuf *, struct mbuf *)
xxx_recvoob(struct socket *, struct mbuf *, int)

- always KASSERT(solocked(so)) even if request is not implemented

- replace calls to pr_generic() with req = PRU_{SEND,RCV}OOB with
calls to pr_{send,recv}oob() respectively.

there is still some tweaking of m_freem(m) and m_freem(control) to come
for consistency. not performed with this commit for clarity.

reviewed by rmind
 1.208 09-Jul-2014  rtr * split PRU_ACCEPT function out of pr_generic() usrreq switches and put
into a separate function xxx_accept(struct socket *, struct mbuf *)

note: future cleanup will take place to remove struct mbuf parameter
type and replace it with a more appropriate type.

patch reviewed by rmind
 1.207 09-Jul-2014  rtr * split PRU_PEERADDR and PRU_SOCKADDR function out of pr_generic()
usrreq switches and put into separate functions
xxx_{peer,sock}addr(struct socket *, struct mbuf *).

- KASSERT(solocked(so)) always in new functions even if request
is not implemented

- KASSERT(pcb != NULL) and KASSERT(nam) if the request is
implemented and not for tcp.

* for tcp roll #ifdef KPROF and #ifdef DEBUG code from tcp_usrreq() into
easier to cut & paste functions tcp_debug_capture() and
tcp_debug_trace()

- functions provided by rmind
- remaining use of PRU_{PEER,SOCK}ADDR #define to be removed in a
future commit.

* rename netbt functions to permit consistency of pru function names
(as has been done with other requests already split out).

- l2cap_{peer,sock}addr() -> l2cap_{peer,sock}_addr_pcb()
- rfcomm_{peer,sock}addr() -> rfcomm_{peer,sock}_addr_pcb()
- sco_{peer,sock}addr() -> sco_{peer,sock}_addr_pcb()

* split/refactor do_sys_getsockname(lwp, fd, which, nam) into
two functions do_sys_get{peer,sock}name(fd, nam).

- move PRU_PEERADDR handling into do_sys_getpeername() from
do_sys_getsockname()
- have svr4_stream directly call do_sys_get{sock,peer}name()
respectively instead of providing `which' & fix a DPRINTF string
that incorrectly wrote "getpeername" when it meant "getsockname"
- fix sys_getpeername() and sys_getsockname() to call
do_sys_get{sock,peer}name() without `which' and `lwp' & adjust
comments
- bump kernel version for removal of lwp & which parameters from
do_sys_getsockname()

note: future cleanup to remove struct mbuf * abuse in
xxx_{peer,sock}name()
still to come, not done in this commit since it is easier to do post
split.

patch reviewed by rmind

welcome to 6.99.47
 1.206 07-Jul-2014  rtr * sprinkle KASSERT(solocked(so)); in all pr_stat() functions.
* fix remaining inconsistent struct socket parameter names.
 1.205 07-Jul-2014  rtr backout change that made pr_stat return EOPNOTSUPP for protocols that
were not filling in struct stat.

decision made after further discussion with rmind and investigation of
how other operating systems behave. soo_stat() is doing just enough to
be able to call what gets returned valid and thus justifys a return of
success.

additional review will be done to determine of the pr_stat functions
that were already returning EOPNOTSUPP can be considered successful with
what soo_stat() is doing.
 1.204 07-Jul-2014  rtr * have pr_stat return EOPNOTSUPP consistently for all protocols that do
not fill in struct stat instead of returning success.

* in pr_stat remove all checks for non-NULL so->so_pcb except where the
pcb is actually used (i.e. cases where we don't return EOPNOTSUPP).

proposed on tech-net@
 1.203 06-Jul-2014  rtr * split PRU_SENSE functionality out of xxx_usrreq() switches and place into
separate xxx_stat(struct socket *, struct stat *) functions.
* replace calls using pr_generic with req == PRU_SENSE with pr_stat().

further change will follow that cleans up the pattern used to extract the
pcb and test for its presence.

reviewed by rmind
 1.202 01-Jul-2014  rtr fix parameter types in pr_ioctl, called xx_control() functions and remove
abuse of pointer to struct mbuf type.

param2 changed to u_long type and uses parameter name 'cmd' (ioctl command)
param3 changed to void * type and uses parameter name 'data'
param4 changed to struct ifnet * and uses parameter name 'ifp'
param5 has been removed (formerly struct lwp *) and uses of 'l' have been
replaced with curlwp from curproc(9).

callers have had (now unnecessary) casts to struct mbuf * removed, called
code has had (now unnecessary) casts to u_long, void * and struct ifnet *
respectively removed.

reviewed by rmind@
 1.201 23-Jun-2014  rtr where appropriate rename xxx_ioctl() struct mbuf * parameters from
`control' to `ifp' after split from xxx_usrreq().

sys_socket.c
fix wrapping of arguments to be consistent with other function calls
in the file after replacing pr_usrreq() call with pr_ioctl() which
required one less argument.

link_proto.c
fix indentation of parameters in link_ioctl() prototype to be
consistent with the rest of the file.

discussed with rmind@
 1.200 22-Jun-2014  rtr * split PRU_CONTROL functionality out of xxx_userreq() switches and place
into separate xxx_ioctl() functions.
* place KASSERT(req != PRU_CONTROL) inside xxx_userreq() as it is now
inappropriate for req = PRU_CONTROL in xxx_userreq().
* replace calls to pr_generic() with req = PRU_CONTROL with pr_ioctl().
* remove & fixup references to PRU_CONTROL xxx_userreq() function comments.
* fix various comments references for xxx_userreq() that mentioned
PRU_CONTROL as xxx_userreq() no longer handles the request.

a further change will follow to fix parameter and naming inconsistencies
retained from original code.

Reviewed by rmind@
 1.199 30-May-2014  christos Introduce 2 new variables: ipsec_enabled and ipsec_used.
Ipsec enabled is controlled by sysctl and determines if is allowed.
ipsec_used is set automatically based on ipsec being enabled, and
rules existing.
 1.198 22-May-2014  rmind Move udp6_input(), udp6_sendup(), udp6_realinput() and udp6_input_checksum()
from udp_usrreq.c to udp6_usrreq.c where they belong. No functional change.
 1.197 20-May-2014  rmind Adjust PR_WRAP_USRREQS() to include the attach/detach functions.
We still need the kernel-lock for some corner cases.
 1.196 19-May-2014  rmind - Split off PRU_ATTACH and PRU_DETACH logic into separate functions.
- Replace malloc with kmem and eliminate M_PCB while here.
- Sprinkle more asserts.
 1.195 18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.194 25-Feb-2014  pooka branches: 1.194.2;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.193 04-Jan-2014  pooka IPv6 UDP uses the IPv4 pcb tables, and therefore the stats, so need
to create the percpu UDPv4 counters even in a v6-only system.
 1.192 02-Jan-2014  pooka Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
 1.191 23-Nov-2013  christos convert from CIRCLEQ to TAILQ.
 1.190 05-Jun-2013  christos branches: 1.190.2;
IPSEC has not come in two speeds for a long time now (IPSEC == kame,
FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
 1.189 05-Jun-2013  christos conditionalize the net traversal code on FAST_IPSEC to make rump build.
 1.188 04-Jun-2013  christos PR/47886: Dr. Wolfgang Stukenbrock: IPSEC_NAT_T enabled kernels may access
outdated pointers and pass ESP data to UPD-sockets.
While here, simplify the code and remove the IPSEC_NAT_T option; always
compile nat-traversal in so that it does not bitrot.
 1.187 22-Jun-2012  christos branches: 1.187.2;
PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.186 22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.185 09-Jan-2012  liamjfoy minor typo fix
 1.184 19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.183 24-Sep-2011  christos branches: 1.183.2; 1.183.6;
Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.182 17-Jul-2011  joerg Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.181 03-Jul-2011  mrg avoid an uninitialised variable warning. this one seems a false
positive, but since it's for some hacky workaround code anyway...
 1.180 03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.179 16-Sep-2009  pooka branches: 1.179.4; 1.179.6;
Replace a large number of link set based sysctl node creations with
calls from subsystem constructors. Benefits both future kernel
modules and rump.

no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
 1.178 19-Jul-2009  minskim Enable IP_MINTTL option for SOCK_DGRAM sockets.
 1.177 18-Apr-2009  tsutsui Remove extra whitespace added by a stupid tool.
XXX: more in src/sys/arch
 1.176 18-Mar-2009  cegger bcopy -> memcpy
 1.175 18-Mar-2009  cegger bzero -> memset
 1.174 19-Jan-2009  christos branches: 1.174.2;
Provide compatibility to the old timeval SCM_TIMESTAMP messages.
 1.173 06-Aug-2008  plunky branches: 1.173.2;
Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.172 04-May-2008  thorpej branches: 1.172.2; 1.172.6;
Simplify the interface to netstat_sysctl() and allocate space for
the collated counters using kmem_alloc().

PR kern/38577
 1.171 26-Apr-2008  yamt branches: 1.171.2;
udp_init: don't forget to allocate udp6stat_percpu.
 1.170 24-Apr-2008  ad Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.169 23-Apr-2008  thorpej Make IPSEC and FAST_IPSEC stats per-cpu. Use <net/net_stats.h> and
netstat_sysctl().
 1.168 15-Apr-2008  thorpej branches: 1.168.2;
Make udp6 stats per-cpu.
 1.167 15-Apr-2008  thorpej Make ip6 and icmp6 stats per-cpu.
 1.166 12-Apr-2008  thorpej Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.165 08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.164 06-Apr-2008  thorpej Change UDP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.
 1.163 27-Nov-2007  christos branches: 1.163.14;
require that the options argument is the right size, not that it is greater
or equal to the requested size. Suggested by Matt Thomas.
 1.162 02-Sep-2007  dyoung branches: 1.162.6;
m_copym(..., 0, M_COPYALL, ...) -> m_copypacket(..., ...).
 1.161 02-Sep-2007  dyoung m_copy() was deprecated, apparently, long ago. m_copy(...) ->
m_copym(..., M_DONTWAIT).
 1.160 27-Jun-2007  degroote branches: 1.160.2; 1.160.6; 1.160.8;
Add support for options IPSEC_NAT_T (RFC 3947 and 3948) for fast_ipsec(4).

No objection on tech-net@
 1.159 12-May-2007  dyoung Use sockaddr_in_init().
 1.158 04-Mar-2007  christos branches: 1.158.2; 1.158.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.157 17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.156 14-Nov-2006  rpaulo branches: 1.156.4;
Remove ifndef COMPAT_42. No objections in tech-net.
 1.155 10-Nov-2006  yamt udp_ctloutput: plug a memory leak.
 1.154 10-Nov-2006  yamt remove some __unused in function parameters.
 1.153 10-Nov-2006  yamt udp_ctloutput: remove unnecessary goto and break.
 1.152 10-Nov-2006  yamt udp_ctloutput: ansify.
 1.151 12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.150 10-Oct-2006  dogcow change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)
 1.149 05-Oct-2006  tls Protect calls to pool_put/pool_get that may occur in interrupt context
with spl used to protect other allocations and frees, or datastructure
element insertion and removal, in adjacent code.

It is almost unquestionably the case that some of the spl()/splx() calls
added here are superfluous, but it really seems wrong to see:

s=splfoo();
/* frob data structure */
splx(s);
pool_put(x);

and if we think we need to protect the first operation, then it is hard
to see why we should not think we need to protect the next. "Better
safe than sorry".

It is also almost unquestionably the case that I missed some pool
gets/puts from interrupt context with my strategy for finding these
calls; use of PR_NOWAIT is a strong hint that a pool may be used from
interrupt context but many callers in the kernel pass a "can wait/can't
wait" flag down such that my searches might not have found them. One
notable area that needs to be looked at is pf.

See also:

http://mail-index.netbsd.org/tech-kern/2006/07/19/0003.html
http://mail-index.netbsd.org/tech-kern/2006/07/19/0009.html
 1.148 23-Jul-2006  ad branches: 1.148.4; 1.148.6;
Use the LWP cached credentials where sane.
 1.147 23-Feb-2006  christos branches: 1.147.2;
Handle IPSEC_NAT_T in the FAST_IPSEC case.
XXX: need to fix the FAST_IPSEC code now.
 1.146 21-Jan-2006  rpaulo branches: 1.146.2; 1.146.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.145 11-Dec-2005  christos branches: 1.145.2;
merge ktrace-lwp.
 1.144 09-Dec-2005  manu Fix a bug in ESP over UDP: because udp4_espinudp() called m_pullup, it
could modify the struct mbuf and calling functions (udp_input() and
udp4_realinput()) would have used a garbled local copy of the pointer.

The fix is not perfect. udp4_espinudp() should use m_pulldown()...
 1.143 15-Nov-2005  dsl Pass the current process structure to in_pcbconnect() so that it can
pass it to in_pcbbind() so that can allocate a low numbered port
if setsockopt() has been used to set IP_PORTRANGE to IP_PORTRANGE_LOW.
While there, fail in_pcbconnect() if the in_pcbbind() fails - rather
than sending the request out from a port of zero.
This has been largely broken since the socket option was added in 1998.
 1.142 03-Sep-2005  kleink branches: 1.142.6;
udp4_espinudp(): don't assume that the Non-ESP marker (or UDP payload)
is aligned on a 64-bit boundary.
 1.141 10-Aug-2005  yamt move {tcp,udp}_do_loopback_cksum back to tcp/udp
so that they can be referenced by ipv6.
 1.140 10-Aug-2005  yamt device independent part of ipv6 rx checksum offloading.
 1.139 05-Aug-2005  elad Add sysctls for IP, ICMP, TCP, and UDP statistics.
 1.138 29-Apr-2005  manu branches: 1.138.2;
Fix memory leak
 1.137 25-Apr-2005  manu Don't sleep when handling ESP over UDP packets.
 1.136 23-Apr-2005  manu Enhance IPSEC_NAT_T so that it can work with multiple machines behind the
same NAT.
 1.135 18-Apr-2005  yamt fix problems related to loopback interface checksum omission. PR/29971.

- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)

ok'ed by Jason Thorpe.
 1.134 11-Mar-2005  atatat branches: 1.134.2;
Revert the change that made kern.file2 and net.*.*.pcblist into nodes
instead of structs. It had other deleterious side-effects that are
rather nasty. Another solution must be found.
 1.133 10-Mar-2005  atatat Change types of kern.file2 and net.*.*.pcblist to NODE
 1.132 09-Mar-2005  atatat Add the following nodes to the sysctl tree:

net.local.stream.pcblist
net.local.dgram.pcblist
net.inet.tcp.pcblist
net.inet.udp.pcblist
net.inet.raw.pcblist
net.inet6.tcp6.pcblist
net.inet6.udp6.pcblist
net.inet6.raw6.pcblist

which allow retrieval of the pcbs in use for those protocols. The
struct involved is 32/64 bit clean and incorporates parts of struct
inpcb, struct unpcb, a bit of struct tcpcb, and two socket addresses.
 1.131 26-Feb-2005  perry nuke trailing whitespace
 1.130 12-Feb-2005  manu Add support for IPsec Network Address Translator traversal (NAT-T), as
described by RFC 3947 and 3948.
 1.129 21-Dec-2004  yamt branches: 1.129.2; 1.129.4;
factor out receive side tcp/udp checksum handling code so that they
can be used by eg. packet filters.

reviewed by Christos Zoulas on tech-net@.
(slightly tweaked since then to make tcp and udp similar.)
 1.128 19-Dec-2004  christos yamt's changes seem to fix all the checksumming issues. Turn the loopback
checksums back off so we can make sure that everything works.
 1.127 18-Dec-2004  yamt udp6_input: correct loopback test.
 1.126 17-Dec-2004  christos Turn checksumming on loopback back on until we fix the bugs in it.
Connect over tcp on the loopback is broken:

4729 amq 0.000007 CALL connect(4,0x804f2a0,0x1c)
4729 amq 75.007420 RET connect -1 errno 60 Connection timed out
 1.125 15-Dec-2004  thorpej Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.124 03-Sep-2004  darrenr add a per-socket counter for dropped UDP packets when the internal buffers
are full.
 1.123 02-Jul-2004  heas Adjust description for net.inet.udp.checksum; it does not controll checking,
only computing.
 1.122 25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.121 07-May-2004  jonathan Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.

New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)

Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)

sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)

sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)

sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)

Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":

New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)

Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.120 01-May-2004  matt Use EVCNT_ATTACH_STATIC{,2}
 1.119 18-Apr-2004  matt ANSI'fy and de __P
 1.118 31-Mar-2004  itojun clean previous commit (uh_sum != 0 check in IPv6)
 1.117 31-Mar-2004  itojun drop packet if IPv6 udp packet does not have checksum (checksum is mandatory
in IPv6).
 1.116 24-Mar-2004  atatat branches: 1.116.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.115 10-Mar-2004  drochner fix tcp/udp checksum test in the M_CSUM_NO_PSEUDOHDR case
(this can never have worked)
now I can use a "bge" gigabit interface with hw checksumming
ttcp-t: 2147483648 bytes in 18.31 real seconds = 114527.11 KB/sec +++
woow!
 1.114 04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.113 23-Oct-2003  mycroft Remove all the code to maintain ia_inpcbs. This information was only used to
close sockets on address changes, which was deemed to be a bad idea and was
summarily removed, so there is no point in wasting effort on maintaining it
any more.
 1.112 18-Oct-2003  enami Fix indent.
 1.111 25-Sep-2003  mycroft Fix glaring errors in recent changes.
 1.110 12-Sep-2003  itojun send icmp admin prohibit if socket policy mismatches.
 1.109 04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.108 22-Aug-2003  itojun remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
 1.107 22-Aug-2003  itojun change the additional arg to be passed to ip{,6}_output to struct socket *.

this fixes KAME policy lookup which was broken by the previous commit.
 1.106 21-Aug-2003  jonathan Honour the M_CSUM_NO_PSEUDOHDR, if set on inbound TCP and UDP packets.
Tested against bcm5700 with patched if_bge.c.
 1.105 15-Aug-2003  jonathan (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if
configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or
with no IPsec should work as before.

All calls to ip_output() now always pass an additional compulsory
argument: the inpcb associated with the packet being sent,
or 0 if no inpcb is available.

Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
 1.104 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.103 29-Jun-2003  fvdl branches: 1.103.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.102 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.101 23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.100 15-Jun-2003  matt Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.99 14-May-2003  itojun always use PULLDOWN_TEST codepath.
 1.98 26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.97 20-Jan-2003  simonb Remove variables that are only assigned too but not referenced.
 1.96 14-Aug-2002  itojun avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE
optimization made last year. should solve PR 17867 and 10195.

IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to
provide IP_HDRINCL variant that does not swap endian.
 1.95 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.94 09-Jun-2002  itojun whitespace
 1.93 12-May-2002  matt branches: 1.93.2; 1.93.4;
Eliminate commons.
 1.92 21-Dec-2001  itojun comment and whitespace. sync with kame
 1.91 13-Nov-2001  lukem add RCSIDs
 1.90 07-Nov-2001  itojun do not grab packet to joined multicast group, when ip6_dst and in6p_laddr
mismatches. it makes the behavior more closer to 4.4BSD IPv4 code.
sync with kame
 1.89 04-Nov-2001  matt Convert netinet to not use the internal <sys/queue.h> field names
but instead the access macros. Use the FOREACH macros where appropriate.
 1.88 02-Nov-2001  itojun array boundary overflow on the use of IPv4 mapped address. from simonb
 1.87 29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.86 24-Oct-2001  itojun remove unused codepath (unifdef -UUDP6)
 1.85 15-Oct-2001  itojun branches: 1.85.2;
implement IPV6_V6ONLY socket option from draft-ietf-ipngwg-rfc2553bis-03.txt.
IPV6_BINDV6ONLY (netbsd only) is deprecated, but still work just like before.
 1.84 17-Sep-2001  thorpej Split the pre-computed ifnet checksum flags into Tx and Rx directions.
Add capabilities bits that indicate an interface can only perform
in-bound TCPv4 or UDPv4 checksums. There is at least one Gig-E chip
for which this is true (Level One LXT-1001), and this is also the
case for the Intel i82559 10/100 Ethernet chips.
 1.83 25-Jul-2001  itojun branches: 1.83.2;
allocate ipsec policy buffer attached to pcb in in*_pcballoc, before
giving anyone accesses to pcb (do not reveal an inconsistent ones).
sync with kame
 1.82 25-Jul-2001  itojun g/c #if 0'ed fragment. sync with kame.
 1.81 17-Jul-2001  enami Add missing counting up of ``socket buffer is full'' counter when
failed to sbappendaddr().
 1.80 03-Jul-2001  itojun branches: 1.80.2;
call in{,6}_pcbpurgeif0() before in{,6}_purgeif().
 1.79 27-Jun-2001  itojun fix udp reception to sockets bound to linklocal address (like fe80::1%lo0).
sync with kame
 1.78 02-Jun-2001  thorpej Implement support for IP/TCP/UDP checksum offloading provided by
network interfaces. This works by pre-computing the pseudo-header
checksum and caching it, delaying the actual checksum to ip_output()
if the hardware cannot perform the sum for us. In-bound checksums
can either be fully-checked by hardware, or summed up for final
verification by software. This method was modeled after how this
is done in FreeBSD, although the code is significantly different in
most places.

We don't delay checksums for IPv6/TCP, but we do take advantage of the
cached pseudo-header checksum.

Note: hardware-assisted checksumming defaults to "off". It is
enabled with ifconfig(8). See the manual page for details.

Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet,
3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
 1.77 31-May-2001  soda missing opt_inet.h
 1.76 08-May-2001  itojun correct faith prefix determination. use sys/netinet/if_faith.c:faithprefix()
to determine. sync with kame.
(without this change, non-faith socket may mistakenly accept for-faith traffic)
 1.75 24-Jan-2001  itojun branches: 1.75.2;
- record IPsec packet history into m_aux structure.
- let ipfilter look at wire-format packet only (not the decapsulated ones),
so that VPN setting can work with NAT/ipfilter settings.
sync with kame.

TODO: use header history for stricter inbound validation
 1.74 04-Dec-2000  itojun more on previous (udp4 multicast fix)
 1.73 04-Dec-2000  itojun fix multicast inbound packet processing.
NetBSD PR 11629 From: salvet@ics.muni.cz
 1.72 17-Oct-2000  itojun be more friendly with INET-less build.
XXX we need to do more to do a working INET-less build
 1.71 30-Aug-2000  itojun minor typo. s/iPsec/IPsec/
 1.70 24-Jul-2000  sommerfeld Drop packet, increment udps_badlen if the udp header length field
reports a size smaller than the udp header; defends against bogosity
detected by Assar Westerlund.

This patch and the previous ip_icmp.c change were the joint work of
assar, itojun, and myself.
 1.69 07-Jul-2000  itojun sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.
 1.68 06-Jul-2000  itojun remove unnecessary #include <netkey/key_debug.h>. from kame.
 1.67 28-Jun-2000  mrg <vm/vm.h> -> <uvm/uvm_extern.h>
 1.66 30-Mar-2000  augustss branches: 1.66.4;
Remove register declarations.
 1.65 30-Mar-2000  simonb Delete reduncdant decl of inetctlerrmap - it's in <netinet/in_var.h>.
 1.64 22-Mar-2000  ws Make IPKDB working again.
Add support for i386 debugging and pci-based ne2000 boards.
 1.63 01-Mar-2000  itojun introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.62 29-Feb-2000  itojun avoid copy-overwrite-copy on incoming udp4 checksum. use in4_cksum
which takes care of pseudo header checksum without overwrites.
 1.61 11-Feb-2000  itojun don't increase both "no port on broadcast packet" and "no port" stat.
increasing both of them will result in negative number on udp
"delivered" stat on netstat(8), since netstat computes number of delivered
packet by subtracting them from number of inbound packets.
 1.60 02-Feb-2000  thorpej PRU_PURGEADDR -> PRU_PURGEIF, per a discussion w/ itojun. In the IPv4
and IPv6 code, also use this to traverse PCB tables, looking for cached
routes referencing the dying ifnet, forcing them to be refreshed.
 1.59 01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.58 31-Jan-2000  itojun bring in latest KAME ipsec tree.
- interop issues in ipcomp is fixed
- padding type (after ESP) is configurable
- key database memory management (need more fixes)
- policy specification is revisited

XXX m->m_pkthdr.rcvif is still overloaded - hope to fix it soon
 1.57 31-Jan-2000  itojun destination port == 0 is illegal based on RFC768.
(NetBSD PR: 9137 - I thought I committed this already but I wasn't)
 1.56 06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.55 06-Jan-2000  itojun make IPV6_BINDV6ONLY setsockopt available. it controls behavior of
AF_INET6 wildcard listening socket. heavily documented in ip6(4).
net.inet6.ip6.bindv6only defines default value. default is 1.

"options INET6_BINDV6ONLY" removes any code fragment that supports
IPV6_BINDV6ONLY == 0 case (not defopt'ed as use of this is rare).
 1.54 22-Dec-1999  itojun drop IPv6 packets with v4 mapped address on src/dst. they are illegal
and may be used to fool IPv6 implementations (by using ::ffff:127.0.0.1 as
source you may be able to pretend the packet is from local node)
 1.53 13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.52 13-Sep-1999  itojun branches: 1.52.2; 1.52.8;
- Call in{,6}_pcbdetach if ipsec initialization is failed during PRU_ATTACH.
This situation happens on severe memory shortage. We may need more
improvements here and there.
- Grab IEEE802 address from IFT_ETHER card, even if the card is
inserted after bootup time. Is there any other card that can be
inserted afterwards? pcmcia fddi card? :-P
- RFC2373 u bit handling suggests that we SHOULD NOT copy interface id from
ethernet card to pseudo interface, when ethernet card has IEEE802/EUI64
with u bit != 0 (this means that IEEE802/EUI64 is not universally unique).
Do not use such address as, for example, interface id for gif interface.
(I have such an ethernet card myself)
This may change interface id for your gif interface. be careful upgrading
rc files.

(sync with recent KAME)
 1.51 09-Aug-1999  itojun return with doing nothing from xx_ctlinput(), when sa->sa_family
is not the expected one.

I see PRC_REDIRECT_HOST with sa->sa_family == AF_UNIX coming to
{tcp,udp}_ctlinput() when I use dhclient, and I feel like adding
more sanity checks, without logging - if we log it it is too noisy.
 1.50 09-Jul-1999  thorpej defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
 1.49 06-Jul-1999  drochner for incoming broadcasts, strip IP/UDP header correctly
wrap a line
 1.48 01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.47 19-Jan-1999  mycroft branches: 1.47.4; 1.47.6;
Don't screw with ip_len; just subtract from it where we actually use the
value.
 1.46 19-Jan-1999  mycroft Don't overwrite the checksum fields when checking them. There's no reason to
do this, and it screws up ICMP replies.
XXX The returned IP checksum and length are still wrong.
 1.45 11-Jan-1999  thorpej Fix byte order and ip_len inconsistencies in ICMP reply code. Also, fix
some formatting and HTONS(foo) vs. foo = htons(foo) inconsistencies.

PR #6602, Darren Reed.
 1.44 05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.43 12-Sep-1997  drochner Adjust packet len in mbuf header for incoming broadcasts.
Closes PR kern/4087 (by myself).
 1.42 28-Jul-1997  thorpej branches: 1.42.2;
Make the following tunable via sysctl, inspired by BSD/OS:
- udp_sendspace
- udp_recvspace
 1.41 24-Jun-1997  thorpej Don't adjust ip->ip_len before calling icmp_error(); icmp_error() already
does this. Per Stevens in TCP/IP Illustrated Vol. 2, p.774, submitted
by Koji Imada <koji@math.human.nagoya-u.ac.jp>.
 1.40 11-Jan-1997  thorpej Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.
 1.39 25-Oct-1996  thorpej In udp_output(), sanity check the length of the packet to be transmitted.
If it's larger than IP_MAXPACKET, return an error condition.
Based on a patch from Bill Fenner <fenner@parc.xerox.com>
 1.38 16-Oct-1996  ws Rename recently checked in KGDB to IPKDB to resolve conflicts with older KGDB
 1.37 30-Sep-1996  ws Add (and change) machine independent files for KGDB support
 1.36 16-Sep-1996  mycroft Make sure the sin_zero fields are filled.
 1.35 15-Sep-1996  mycroft Hash unconnected PCBs.
 1.34 09-Sep-1996  mycroft Add in_nullhost() and in_hosteq() macros, to hide some protocol
details. Also, fix a bug in TCP wrt SYN+URG packets.
 1.33 23-May-1996  mycroft udp_output() doesn't actually take control data, so don't pretend it does.
 1.32 23-May-1996  mycroft Make sure the control mbufs are freed in all cases.
 1.31 23-May-1996  mycroft Fix a race condition in PRU_DISCONNECT.
Rearrange the code to deal with unconnected sockets slightly.
Other minor changes.
 1.30 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.29 20-May-1996  mrg branches: 1.29.2;
if the sender set a cksum, check it, regardless if we care to
generate and send them ourselves. from rich stevens.
 1.28 16-Mar-1996  christos Fix printf format args.
 1.27 13-Feb-1996  christos netinet prototypes
 1.26 31-Jan-1996  mycroft Build a hash table of PCBs. Hash function needs tweaking.
 1.25 21-Nov-1995  cgd make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.
 1.24 12-Aug-1995  mycroft branches: 1.24.2;
splnet --> splsoftnet
 1.23 26-Jun-1995  cgd fix typo
 1.22 18-Jun-1995  cgd convert pcb lists to CIRCLEQs, so that the end can be looked at more
easily, and so that the original (insque/remque) logic can be effectively
mimiced. (This fixes a bug in the previous set of list changes.)
also (since terminator is no longer null) reinstate uninitted list checks,
but mark them XXX.
 1.21 12-Jun-1995  mycroft Fix bogon in previous.
 1.20 12-Jun-1995  mycroft Change in_pcbnotify*() to take an errno value. Make inetctlerrmap[] an
array on ints, not u_chars.
 1.19 12-Jun-1995  mycroft Oops. Make source quench work again.
 1.18 12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.17 04-Jun-1995  mycroft Clean up many more casts.
 1.16 01-Jun-1995  mycroft Avoid byte-swapping IP addresses at run time.
 1.15 13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.14 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.13 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.12 10-Feb-1994  mycroft Format police.
 1.11 02-Feb-1994  hpeyerl Multicast is no longer optional.
 1.10 10-Jan-1994  mycroft Should compile now with or without `options MULTICAST'.
 1.9 08-Jan-1994  mycroft More prototypes.
 1.8 08-Jan-1994  mycroft Slight rearrangement.
 1.7 08-Jan-1994  mycroft Prototypes.
 1.6 08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.5 18-Dec-1993  mycroft Canonicalize all #includes.
 1.4 06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.3 22-May-1993  cgd branches: 1.3.4;
add include of select.h if necessary for protos, or delete if extraneous
 1.2 18-May-1993  cgd make kernel select interface be one-stop shopping & clean it all up.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.3.4.1 24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.24.2.1 02-Feb-1996  mycroft Bring in changes for mondo patch 2.
 1.29.2.2 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.29.2.1 10-Nov-1996  thorpej Update from trunk:
- Make ip_len and ip_off unsigned.
- Make sure we don't accept or transmit packets larger than the
maximim IP packet size.
This fixes the so-called `death ping' bug.

Sum of work from Bill Fenner <fenner@parc.xerox.com>,
Kevin Lahey <kml@nas.nasa.gov>, and myself.

Thanks to Curt Sampson, Jukka Marin, and Kevin Lahey for testing
this under NetBSD 1.2
 1.42.2.1 16-Sep-1997  thorpej Update marc-pcmcia branch from trunk.
 1.47.6.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.47.6.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.47.6.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.47.4.2 02-Aug-1999  thorpej Update from trunk.
 1.47.4.1 01-Jul-1999  thorpej Sync w/ -current.
 1.52.8.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.52.2.3 11-Feb-2001  bouyer Sync with HEAD.
 1.52.2.2 08-Dec-2000  bouyer Sync with HEAD.
 1.52.2.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.66.4.6 12-Apr-2004  jmc Pullup patch (requested by itojun in ticket #134)

Drop packet if IPv6 udp packet does not have checksum.
 1.66.4.5 09-May-2001  he Pull up revision 1.76 (requested by itojun):
Correct faith prefix determintaion.
 1.66.4.4 06-Apr-2001  he Pull up revision 1.75 (requested by itojun):
Record IPsec packet history in m_aux structure. Let ipfilter
look at wire-format packet only (not the decapsulated ones), so
that VPN setting can work with NAT/ipfilter settings.
 1.66.4.3 15-Dec-2000  he Pull up revision 1.74 (requested by hubertf):
Fix multicast inbound packet processing. Fixes PR#11629.
(Required continuation of previous pullup.)
 1.66.4.2 13-Dec-2000  he Pull up revision 1.73 (requested by itojun):
Fix multicast inbound packet processing. Fixes PR#11629.
 1.66.4.1 28-Jul-2000  sommerfeld Pull up UDP, ICMP fixes:

- Drop packet, increment udps_badlen if the udp header length field
reports a size smaller than the udp header; defends against bogus
packets seen by by Assar Westerlund.

- allow icmp_error() to work when icmpreturndatabytes is sufficiently
large that the icmp error message doesn't fit in a header mbuf.

- defend against mbuf chains shorter than their contained ip->ip_len.

Joint work of myself, itojun, and assar
Approved by thorpej

revisions pulled up:
sys/netinet/ip_icmp.c 1.52
sys/netinet/udp_usrreq.c 1.70
 1.75.2.9 27-Aug-2002  nathanw Catch up to -current.
 1.75.2.8 01-Aug-2002  nathanw Catch up to -current.
 1.75.2.7 20-Jun-2002  nathanw Catch up to -current.
 1.75.2.6 08-Jan-2002  nathanw Catch up to -current.
 1.75.2.5 14-Nov-2001  nathanw Catch up to -current.
 1.75.2.4 22-Oct-2001  nathanw Catch up to -current.
 1.75.2.3 21-Sep-2001  nathanw Catch up to -current.
 1.75.2.2 24-Aug-2001  nathanw Catch up with -current.
 1.75.2.1 21-Jun-2001  nathanw Catch up to -current.
 1.80.2.4 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.80.2.3 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.80.2.2 10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.80.2.1 03-Aug-2001  lukem update to -current
 1.83.2.1 01-Oct-2001  fvdl Catch up with -current.
 1.85.2.1 12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.93.4.2 31-Mar-2004  tron Pull up revision 1.118 (requested by itojun in ticket #1645):
clean previous commit (uh_sum != 0 check in IPv6)
 1.93.4.1 31-Mar-2004  tron Pull up revision 1.117 (requested by itojun in ticket #1645):
drop packet if IPv6 udp packet does not have checksum (checksum is mandatory
in IPv6).
 1.93.2.3 29-Aug-2002  gehenna catch up with -current.
 1.93.2.2 15-Jul-2002  gehenna catch up with -current.
 1.93.2.1 20-Jun-2002  gehenna catch up with -current.
 1.103.2.11 11-Dec-2005  christos Sync with head.
 1.103.2.10 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.103.2.9 01-Apr-2005  skrll Sync with HEAD.
 1.103.2.8 04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.103.2.7 15-Feb-2005  skrll Sync with HEAD.
 1.103.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.103.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.103.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.103.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.103.2.2 03-Aug-2004  skrll Sync with HEAD
 1.103.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.116.2.4 28-May-2004  tron Pull up revision 1.122 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.116.2.3 10-May-2004  tron Pull up revision 1.121 (requested by jonathan in ticket #280):
Redo net.inet.* sysctl subtree for fast-ipsec from scratch.
Attach FAST-IPSEC statistics with 64-bit counters to new sysctl MIB.
Rework netstat to show FAST_IPSEC statistics, via sysctl, for
netstat -p ipsec.
New kernel files:
sys/netipsec/Makefile (new file; install *_var.h includes)
sys/netipsec/ipsec_var.h (new 64-bit mib counter struct)
Changed kernel files:
sys/Makefile (recurse into sys/netipsec/)
sys/netinet/in.h (fake IP_PROTO name for fast_ipsec
sysctl subtree.)
sys/netipsec/ipsec.h (minimal userspace inclusion)
sys/netipsec/ipsec_osdep.h (minimal userspace inclusion)
sys/netipsec/ipsec_netbsd.c (redo sysctl subtree from scratch)
sys/netipsec/key*.c (fix broken net.key subtree)
sys/netipsec/ah_var.h (increase all counters to 64 bits)
sys/netipsec/esp_var.h (increase all counters to 64 bits)
sys/netipsec/ipip_var.h (increase all counters to 64 bits)
sys/netipsec/ipcomp_var.h (increase all counters to 64 bits)
sys/netipsec/ipsec.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_mbuf.c (add #include netipsec/ipsec_var.h)
sys/netipsec/ipsec_output.c (add #include netipsec/ipsec_var.h)
sys/netinet/raw_ip.c (add #include netipsec/ipsec_var.h)
sys/netinet/tcp_input.c (add #include netipsec/ipsec_var.h)
sys/netinet/udp_usrreq.c (add #include netipsec/ipsec_var.h)
Changes to usr.bin/netstat to print the new fast-ipsec sysctl tree
for "netstat -s -p ipsec":
New file:
usr.bin/netstat/fast_ipsec.c (print fast-ipsec counters)
Changed files:
usr.bin/netstat/Makefile (add fast_ipsec.c)
usr.bin/netstat/netstat.h (declarations for fast_ipsec.c)
usr.bin/netstat/main.c (call KAME-vs-fast-ipsec dispatcher)
 1.116.2.2 31-Mar-2004  tron Pull up revision 1.118 (requested by itojun in ticket #28):
clean previous commit (uh_sum != 0 check in IPv6)
 1.116.2.1 31-Mar-2004  tron Pull up revision 1.117 (requested by itojun in ticket #28):
drop packet if IPv6 udp packet does not have checksum (checksum is mandatory
in IPv6).
 1.129.4.2 19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.129.4.1 12-Feb-2005  yamt sync with head.
 1.129.2.1 29-Apr-2005  kent sync with -current
 1.134.2.6 29-Dec-2005  riz Pull up following revision(s) (requested by manu in ticket #1052):
sys/netinet/udp_usrreq.c: revision 1.144
Fix a bug in ESP over UDP: because udp4_espinudp() called m_pullup, it
could modify the struct mbuf and calling functions (udp_input() and
udp4_realinput()) would have used a garbled local copy of the pointer.
The fix is not perfect. udp4_espinudp() should use m_pulldown()...
 1.134.2.5 08-Sep-2005  tron Pull up following revision(s) (requested by kleink in ticket #744):
sys/netinet/udp_usrreq.c: revision 1.142
udp4_espinudp(): don't assume that the Non-ESP marker (or UDP payload)
is aligned on a 64-bit boundary.
 1.134.2.4 06-May-2005  tron Pull up revision 1.135 (requested by yamt in ticket #251):
fix problems related to loopback interface checksum omission. PR/29971.
- for ipv4, defer decision to ip layer as h/w checksum offloading does
so that it can check the actual interface the packet is going to.
- for ipv6, disable it.
(maybe will be revisited when it implements h/w checksum offloading.)
ok'ed by Jason Thorpe.
 1.134.2.3 01-May-2005  tron Pull up revision 1.138 (requested by manu in ticket #216):
Fix memory leak
 1.134.2.2 28-Apr-2005  tron Pull up revision 1.137 (requested by manu in ticket #203):
Don't sleep when handling ESP over UDP packets.
 1.134.2.1 28-Apr-2005  tron Pull up revision 1.136 (requested by man in ticket #201):
Enhance IPSEC_NAT_T so that it can work with multiple machines behind
the same NAT.
 1.138.2.5 07-Dec-2007  yamt sync with head
 1.138.2.4 03-Sep-2007  yamt sync with head.
 1.138.2.3 26-Feb-2007  yamt sync with head.
 1.138.2.2 30-Dec-2006  yamt sync with head.
 1.138.2.1 21-Jun-2006  yamt sync with head.
 1.142.6.1 22-Nov-2005  yamt sync with head.
 1.145.2.2 01-Mar-2006  yamt sync with head.
 1.145.2.1 01-Feb-2006  yamt sync with head.
 1.146.4.1 22-Apr-2006  simonb Sync with head.
 1.146.2.3 09-Sep-2006  rpaulo sync with head
 1.146.2.2 07-Feb-2006  rpaulo in6pcb -> inpcb.
 1.146.2.1 05-Feb-2006  rpaulo <netinet6/in6_pcb.h> went away. Bye!
 1.147.2.1 11-Aug-2006  yamt sync with head
 1.148.6.2 10-Dec-2006  yamt sync with head.
 1.148.6.1 22-Oct-2006  yamt sync with head
 1.148.4.1 18-Nov-2006  ad Sync with head.
 1.156.4.3 17-May-2007  yamt sync with head.
 1.156.4.2 12-Mar-2007  rmind Sync with HEAD.
 1.156.4.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.158.4.1 11-Jul-2007  mjf Sync with head.
 1.158.2.3 09-Oct-2007  ad Sync with head.
 1.158.2.2 15-Jul-2007  ad Sync with head.
 1.158.2.1 08-Jun-2007  ad Sync with head.
 1.160.8.2 09-Jan-2008  matt sync with HEAD
 1.160.8.1 06-Nov-2007  matt sync with HEAD
 1.160.6.2 03-Dec-2007  joerg Sync with HEAD.
 1.160.6.1 03-Sep-2007  jmcneill Sync with HEAD.
 1.160.2.1 03-Sep-2007  skrll Sync with HEAD.
 1.162.6.1 08-Dec-2007  mjf Sync with HEAD.
 1.163.14.2 28-Sep-2008  mjf Sync with HEAD.
 1.163.14.1 02-Jun-2008  mjf Sync with HEAD.
 1.168.2.1 18-May-2008  yamt sync with head.
 1.171.2.4 11-Mar-2010  yamt sync with head
 1.171.2.3 19-Aug-2009  yamt sync with head.
 1.171.2.2 04-May-2009  yamt sync with head.
 1.171.2.1 16-May-2008  yamt sync with head.
 1.172.6.1 19-Oct-2008  haad Sync with HEAD.
 1.172.2.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.173.2.2 28-Apr-2009  skrll Sync with HEAD.
 1.173.2.1 19-Jan-2009  skrll Sync with HEAD.
 1.174.2.2 23-Jul-2009  jym Sync with HEAD.
 1.174.2.1 13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.179.6.1 06-Jun-2011  jruoho Sync with HEAD.
 1.179.4.1 31-May-2011  rmind sync with head
 1.183.6.2 05-Apr-2012  mrg sync to latest -current.
 1.183.6.1 18-Feb-2012  mrg merge to -current.
 1.183.2.3 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.183.2.2 30-Oct-2012  yamt sync with head
 1.183.2.1 17-Apr-2012  yamt sync with head
 1.187.2.3 03-Dec-2017  jdolecek update from HEAD
 1.187.2.2 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.187.2.1 23-Jun-2013  tls resync from head
 1.190.2.5 18-May-2014  rmind sync with head
 1.190.2.4 17-Oct-2013  rmind Eliminate some of the splsoftnet() calls, misc clean up.
 1.190.2.3 23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.190.2.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.190.2.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.194.2.1 10-Aug-2014  tls Rebase.
 1.217.10.1 06-Jun-2018  martin Pull up following revision(s) (requested by maxv in ticket #1607):

sys/netinet/udp_usrreq.c: revision 1.237 (via patch)

Fix three pretty bad mistakes in NAT-T:

* If we got a keepalive packet, we need to call m_freem, not m_free.
Here the next mbufs in the chain are not freed. Seems easy to remotely
DoS the system by sending fragmented keepalives in a loop.

* If !ipsec_used, free the mbuf.

* In udp_input, we need to update 'uh', because udp4_realinput may have
modified the chain. Perhaps we also need to re-enforce alignment, so
add an XXX.
 1.217.6.1 07-Jun-2018  martin Pull up following revision(s) (requested by maxv in ticket #1607):

sys/netinet/udp_usrreq.c: revision 1.237 (via patch)

Fix three pretty bad mistakes in NAT-T:

* If we got a keepalive packet, we need to call m_freem, not m_free.
Here the next mbufs in the chain are not freed. Seems easy to remotely
DoS the system by sending fragmented keepalives in a loop.

* If !ipsec_used, free the mbuf.

* In udp_input, we need to update 'uh', because udp4_realinput may have
modified the chain. Perhaps we also need to re-enforce alignment, so
add an XXX.
 1.217.4.9 28-Aug-2017  skrll Sync with HEAD
 1.217.4.8 05-Feb-2017  skrll Sync with HEAD
 1.217.4.7 05-Dec-2016  skrll Sync with HEAD
 1.217.4.6 09-Jul-2016  skrll Sync with HEAD
 1.217.4.5 29-May-2016  skrll Sync with HEAD
 1.217.4.4 19-Mar-2016  skrll Sync with HEAD
 1.217.4.3 22-Sep-2015  skrll Sync with HEAD
 1.217.4.2 06-Jun-2015  skrll Sync with HEAD
 1.217.4.1 06-Apr-2015  skrll Sync with HEAD
 1.217.2.1 06-Jun-2018  martin Pull up following revision(s) (requested by maxv in ticket #1607):

sys/netinet/udp_usrreq.c: revision 1.237 (via patch)

Fix three pretty bad mistakes in NAT-T:

* If we got a keepalive packet, we need to call m_freem, not m_free.
Here the next mbufs in the chain are not freed. Seems easy to remotely
DoS the system by sending fragmented keepalives in a loop.

* If !ipsec_used, free the mbuf.

* In udp_input, we need to update 'uh', because udp4_realinput may have
modified the chain. Perhaps we also need to re-enforce alignment, so
add an XXX.
 1.226.2.4 26-Apr-2017  pgoyette Sync with HEAD
 1.226.2.3 20-Mar-2017  pgoyette Sync with HEAD
 1.226.2.2 07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.226.2.1 04-Nov-2016  pgoyette Sync with HEAD
 1.229.2.1 21-Apr-2017  bouyer Sync with HEAD
 1.233.4.3 07-Jun-2018  martin Pull up following revision(s) (requested by maxv in ticket #837):

sys/netinet/udp_usrreq.c: revision 1.237

Fix three pretty bad mistakes in NAT-T:

* If we got a keepalive packet, we need to call m_freem, not m_free.
Here the next mbufs in the chain are not freed. Seems easy to remotely
DoS the system by sending fragmented keepalives in a loop.

* If !ipsec_used, free the mbuf.

* In udp_input, we need to update 'uh', because udp4_realinput may have
modified the chain. Perhaps we also need to re-enforce alignment, so
add an XXX.
 1.233.4.2 09-Apr-2018  bouyer Pull up following revision(s) (requested by roy in ticket #724):
tests/net/icmp/t_ping.c: revision 1.19
sys/netinet6/raw_ip6.c: revision 1.166
sys/netinet6/ip6_input.c: revision 1.195
sys/net/raw_usrreq.c: revision 1.59
sys/sys/socketvar.h: revision 1.151
sys/kern/uipc_socket2.c: revision 1.128
tests/lib/libc/sys/t_recvmmsg.c: revision 1.2
lib/libc/sys/recv.2: revision 1.38
sys/net/rtsock.c: revision 1.239
sys/netinet/udp_usrreq.c: revision 1.246
sys/netinet6/icmp6.c: revision 1.224
tests/net/icmp/t_ping.c: revision 1.20
sys/netipsec/keysock.c: revision 1.63
sys/netinet/raw_ip.c: revision 1.172
sys/kern/uipc_socket.c: revision 1.260
tests/net/icmp/t_ping.c: revision 1.22
sys/kern/uipc_socket.c: revision 1.261
tests/net/icmp/t_ping.c: revision 1.23
sys/netinet/ip_mroute.c: revision 1.155
sbin/route/route.c: revision 1.159
sys/netinet6/ip6_mroute.c: revision 1.123
sys/netatalk/ddp_input.c: revision 1.31
sys/netcan/can.c: revision 1.3
sys/kern/uipc_usrreq.c: revision 1.184
sys/netinet6/udp6_usrreq.c: revision 1.138
tests/net/icmp/t_ping.c: revision 1.18
socket: report receive buffer overflows
Add soroverflow() which increments the overflow counter, sets so_error
to ENOBUFS and wakes the receive socket up.
Replace all code that manually increments this counter with soroverflow().
Add soroverflow() to raw_input().
This allows userland to detect route(4) overflows so it can re-sync
with the current state.
socket: clear error even when peeking
The error has already been reported and it's pointless requiring another
recv(2) call just to clear it.
socket: remove now incorrect comment that so_error is only udp
As it can be affected by route(4) sockets which are raw.
rtsock: log dropped messages that we cannot report to userland
Handle ENOBUFS when receiving messages.
Don't send messages if the receiver has died.
Sprinkle more soroverflow().
Handle ENOBUFS in recv
Handle ENOBUFS in sendto
Note value received. Harden another sendto for ENOBUFS.
Handle the routing socket overflowing gracefully.
Allow a valid sendto .... duh
Handle errors better.
Fix test for checking we sent all the data we asked to.
 1.233.4.1 21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.245.2.9 18-Jan-2019  pgoyette Synch with HEAD
 1.245.2.8 26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.245.2.7 30-Sep-2018  pgoyette Ssync with HEAD
 1.245.2.6 28-Jul-2018  pgoyette Sync with HEAD
 1.245.2.5 25-Jun-2018  pgoyette Sync with HEAD
 1.245.2.4 21-May-2018  pgoyette Sync with HEAD
 1.245.2.3 02-May-2018  pgoyette Synch with HEAD
 1.245.2.2 16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.245.2.1 22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.254.2.1 10-Jun-2019  christos Sync with HEAD
 1.259.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.264.8.1 02-Aug-2025  perseant Sync with HEAD
 1.48 03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.47 03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.46 20-Aug-2020  riastradh branches: 1.46.2;
[ozaki-r] Changes to the kernel core for wireguard
 1.45 14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.44 22-Aug-2018  msaitoh - Cleanup for dynamic sysctl:
- Remove unused *_NAMES macros for sysctl.
- Remove unused *_MAXID for sysctls.
- Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and
use them on all m68k machines.
 1.43 08-Feb-2018  maxv branches: 1.43.2; 1.43.4;
Style, and remove prototype of udp_sysctl (does not exist).
 1.42 10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.41 20-Jan-2016  riastradh branches: 1.41.10;
Give proper prototype to udp_output.
 1.40 18-May-2014  rmind branches: 1.40.4;
Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.39 02-Jan-2014  pooka branches: 1.39.2;
Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
 1.38 22-Jun-2012  christos branches: 1.38.2; 1.38.4;
PR/46602: Move the rfc6056 port randomization to the IP layer.
 1.37 24-Sep-2011  christos branches: 1.37.2;
Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.36 06-Aug-2008  plunky Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.35 12-Apr-2008  thorpej branches: 1.35.4; 1.35.6; 1.35.10;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.34 06-Apr-2008  thorpej Change UDP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmpstat structure; old netstat
binaries will continue to work properly.
 1.33 25-Dec-2007  perry branches: 1.33.6;
Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.32 17-Feb-2007  dyoung branches: 1.32.18; 1.32.24; 1.32.26; 1.32.30;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.31 11-Dec-2005  christos branches: 1.31.26;
merge ktrace-lwp.
 1.30 10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.29 10-Aug-2005  yamt move {tcp,udp}_do_loopback_cksum back to tcp/udp
so that they can be referenced by ipv6.
 1.28 05-Aug-2005  elad Add sysctls for IP, ICMP, TCP, and UDP statistics.
 1.27 12-Feb-2005  manu branches: 1.27.6;
Add support for IPsec Network Address Translator traversal (NAT-T), as
described by RFC 3947 and 3948.
 1.26 21-Dec-2004  yamt branches: 1.26.2; 1.26.4;
factor out receive side tcp/udp checksum handling code so that they
can be used by eg. packet filters.

reviewed by Christos Zoulas on tech-net@.
(slightly tweaked since then to make tcp and udp similar.)
 1.25 15-Dec-2004  thorpej Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.24 21-Apr-2004  itojun no space between function name and paren: foo (blah) -> foo(blah)
 1.23 18-Apr-2004  matt De __P()
 1.22 07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.21 29-Jun-2003  fvdl branches: 1.21.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.20 28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.19 30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.18 12-May-2002  matt branches: 1.18.2;
Eliminate commons.
 1.17 20-Nov-1999  thorpej branches: 1.17.6; 1.17.8;
Add the `packed' attribute to structures which describe wire protocol data.
 1.16 19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.15 10-Feb-1998  perry branches: 1.15.12; 1.15.14; 1.15.20;
add/cleanup multiple inclusion protection.
 1.14 28-Jul-1997  thorpej Make the following tunable via sysctl, inspired by BSD/OS:
- udp_sendspace
- udp_recvspace
 1.13 22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.12 13-Feb-1996  christos branches: 1.12.4;
netinet prototypes
 1.11 31-Jan-1996  mycroft Build a hash table of PCBs. Hash function needs tweaking.
 1.10 21-Nov-1995  cgd make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.
 1.9 12-Jun-1995  mycroft branches: 1.9.2;
Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.8 26-Mar-1995  jtc KERNEL -> _KERNEL
 1.7 29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.6 13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.5 10-Jan-1994  mycroft Change the counters to be all the same type -- u_long.
 1.4 08-Jan-1994  mycroft Prototypes.
 1.3 20-May-1993  cgd more rcsid additions and file header cleanups
 1.2 19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1 21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.2 05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1 21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.9.2.1 02-Feb-1996  mycroft Bring in changes for mondo patch 2.
 1.12.4.1 11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.15.20.1 27-Dec-1999  wrstuden Pull up to last week's -current.
 1.15.14.1 20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.15.12.3 30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.15.12.2 06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.15.12.1 28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.17.8.2 06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.17.8.1 23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.17.6.2 01-Aug-2002  nathanw Catch up to -current.
 1.17.6.1 20-Jun-2002  nathanw Catch up to -current.
 1.18.2.1 15-Jul-2002  gehenna catch up with -current.
 1.21.2.9 11-Dec-2005  christos Sync with head.
 1.21.2.8 10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.21.2.7 15-Feb-2005  skrll Sync with HEAD.
 1.21.2.6 17-Jan-2005  skrll Sync with HEAD.
 1.21.2.5 18-Dec-2004  skrll Sync with HEAD.
 1.21.2.4 21-Sep-2004  skrll Fix the sync with head I botched.
 1.21.2.3 18-Sep-2004  skrll Sync with HEAD.
 1.21.2.2 03-Aug-2004  skrll Sync with HEAD
 1.21.2.1 02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.26.4.1 12-Feb-2005  yamt sync with head.
 1.26.2.1 29-Apr-2005  kent sync with -current
 1.27.6.3 21-Jan-2008  yamt sync with head
 1.27.6.2 26-Feb-2007  yamt sync with head.
 1.27.6.1 21-Jun-2006  yamt sync with head.
 1.31.26.1 27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.32.30.1 02-Jan-2008  bouyer Sync with HEAD
 1.32.26.1 26-Dec-2007  ad Sync with head.
 1.32.24.1 18-Feb-2008  mjf Sync with HEAD.
 1.32.18.1 09-Jan-2008  matt sync with HEAD
 1.33.6.2 28-Sep-2008  mjf Sync with HEAD.
 1.33.6.1 02-Jun-2008  mjf Sync with HEAD.
 1.35.10.1 19-Oct-2008  haad Sync with HEAD.
 1.35.6.1 18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.35.4.1 04-May-2009  yamt sync with head.
 1.37.2.2 22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.37.2.1 30-Oct-2012  yamt sync with head
 1.38.4.4 18-May-2014  rmind sync with head
 1.38.4.3 23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.38.4.2 28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.38.4.1 17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.38.2.2 03-Dec-2017  jdolecek update from HEAD
 1.38.2.1 20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.39.2.1 10-Aug-2014  tls Rebase.
 1.40.4.2 28-Aug-2017  skrll Sync with HEAD
 1.40.4.1 19-Mar-2016  skrll Sync with HEAD
 1.41.10.1 21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.43.4.1 10-Jun-2019  christos Sync with HEAD
 1.43.2.2 30-Sep-2018  pgoyette Ssync with HEAD
 1.43.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.46.2.1 03-Apr-2021  thorpej Sync with HEAD.
 1.8 07-Feb-2020  thorpej Use percpu_foreach_xcall() to gather volatile per-cpu counters. These
must be serialized against the interrupts / soft-interrupts in which
they're manipulated, as well as protected from non-atomic 64-bit memory
loads on 32-bit platforms.
 1.7 01-Feb-2020  riastradh Switch sys/net to percpu_create.
 1.6 19-Sep-2019  ozaki-r branches: 1.6.2;
wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.
A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.

Reviewed by knakahara@ and yamaguchi@
 1.5 10-Aug-2018  msaitoh branches: 1.5.4;
Change the type of wqinput's drop counter to uint64_t. OK'd by ozaki-r@.
 1.4 24-Feb-2018  ozaki-r branches: 1.4.2; 1.4.4;
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043
 1.3 02-Jun-2017  para branches: 1.3.2; 1.3.8;
pool_init does not copy its name argument
therefore don't pass in a stack allocated buffer

vmstat -mv shows pool(s) with broken name(s)

use the name argument passed into wqinput_create directly
which is a static string in all 4 callee cases

(workqueue_create/workqueue_init copies the name argument)
 1.2 21-May-2017  ozaki-r Add missing NULL check for pool_get call with PR_NOWAIT

This should fix a kernel panic reported by wiz@ on current-users ML:
http://mail-index.netbsd.org/current-users/2017/05/03/msg031646.html
 1.1 02-Feb-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.8;
Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net
 1.1.8.2 21-Apr-2017  bouyer Sync with HEAD
 1.1.8.1 02-Feb-2017  bouyer file wqinput.c was added on branch bouyer-socketcan on 2017-04-21 16:54:06 +0000
 1.1.4.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.4.1 02-Feb-2017  pgoyette file wqinput.c was added on branch pgoyette-localcount on 2017-03-20 06:57:51 +0000
 1.1.2.3 28-Aug-2017  skrll Sync with HEAD
 1.1.2.2 05-Feb-2017  skrll Sync with HEAD
 1.1.2.1 02-Feb-2017  skrll file wqinput.c was added on branch nick-nhusb on 2017-02-05 13:40:59 +0000
 1.3.8.2 03-Dec-2017  jdolecek update from HEAD
 1.3.8.1 02-Jun-2017  jdolecek file wqinput.c was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.3.2.2 24-Sep-2019  martin Pull up following revision(s) (requested by knakahara in ticket #1385):

sys/net/if.c 1.461
sys/net/if.h 1.277
sys/net/if_gif.c 1.149
sys/net/if_gif.h 1.33
sys/net/if_ipsec.c 1.19,1.20,1.24
sys/net/if_ipsec.h 1.5
sys/net/if_l2tp.c 1.33,1.36-1.39
sys/net/if_l2tp.h 1.7,1.8
sys/net/route.c 1.220,1.221
sys/net/route.h 1.125
sys/netinet/in_gif.c 1.95
sys/netinet/in_l2tp.c 1.17
sys/netinet/ip_input.c 1.391,1.392
sys/netinet/wqinput.c 1.6
sys/netinet6/in6_gif.c 1.94
sys/netinet6/in6_l2tp.c 1.18
sys/netinet6/ip6_forward.c 1.97
sys/netinet6/ip6_input.c 1.210,1.211
sys/netipsec/ipsec_output.c 1.82,1.83 (patched)
sys/netipsec/ipsecif.c 1.12,1.13,1.15,1.17 (patched)
sys/netipsec/key.c 1.259,1.260

ipsecif(4) support input drop packet counter.

ipsecif(4) should not increment drop counter by errors not related to if_snd. Pointed out by ozaki-r@n.o, thanks.
Remove unnecessary addresses in PF_KEY message.

MOBIKE Extensions for PF_KEY draft-schilcher-mobike-pfkey-extension-01.txt says
 1.3.2.1 26-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #588):
sys/netinet6/in6.c: revision 1.260
sys/netinet/in.c: revision 1.219
sys/netinet/wqinput.c: revision 1.4
sys/rump/net/lib/libnetinet/netinet_component.c: revision 1.11
sys/netinet/ip_input.c: revision 1.376
sys/netinet6/ip6_input.c: revision 1.193
Avoid a deadlock between softnet_lock and IFNET_LOCK

A deadlock occurs because there is a violation of the rule of lock ordering;
softnet_lock is held with hodling IFNET_LOCK, which violates the rule.
To avoid the deadlock, replace softnet_lock in in_control and in6_control
with KERNEL_LOCK.

We also need to add some KERNEL_LOCKs to protect the network stack surely.
This is required, for example, for PR kern/51356.

Fix PR kern/53043
 1.4.4.2 13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.4.4.1 10-Jun-2019  christos Sync with HEAD
 1.4.2.1 06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.5.4.1 24-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #238):

sys/netipsec/ipsec_output.c: revision 1.83
sys/net/route.h: revision 1.125
sys/netinet6/ip6_input.c: revision 1.210
sys/netinet6/ip6_input.c: revision 1.211
sys/net/if.c: revision 1.461
sys/net/if_gif.h: revision 1.33
sys/net/route.c: revision 1.220
sys/net/route.c: revision 1.221
sys/net/if.h: revision 1.277
sys/netinet6/ip6_forward.c: revision 1.97
sys/netinet/wqinput.c: revision 1.6
sys/net/if_ipsec.h: revision 1.5
sys/netinet6/in6_l2tp.c: revision 1.18
sys/netinet6/in6_gif.c: revision 1.94
sys/net/if_l2tp.h: revision 1.7
sys/net/if_gif.c: revision 1.149
sys/net/if_l2tp.h: revision 1.8
sys/netinet/in_gif.c: revision 1.95
sys/netinet/in_l2tp.c: revision 1.17
sys/netipsec/ipsecif.c: revision 1.17
sys/net/if_ipsec.c: revision 1.24
sys/net/if_l2tp.c: revision 1.37
sys/netinet/ip_input.c: revision 1.391
sys/net/if_l2tp.c: revision 1.38
sys/netinet/ip_input.c: revision 1.392
sys/net/if_l2tp.c: revision 1.39

Avoid having a rtcache directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

wqinput: avoid having struct wqinput_worklist directly in a percpu storage

percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Input handlers of wqinput normally involves sleepable operations so we must
avoid dereferencing a percpu data (struct wqinput_worklist) after executing
an input handler. Address this situation by having just a pointer to the data
in a percpu storage instead.
Reviewed by knakahara@ and yamaguchi@

-

Add missing #include <sys/kmem.h>

-

Divide Tx context of l2tp(4) to improve performance.

It seems l2tp(4) call path is too long for instruction cache. So, dividing
l2tp(4) Tx context improves CPU use efficiency.

After this commit, l2tp(4) throughput gains 10% on my machine(Atom C3000).

-

Apply some missing changes lost on the previous commit

-

Avoid having a rtcache directly in a percpu storage for tunnel protocols.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Using rtcache, i.e., packet processing, typically involves sleepable operations
such as rwlock so we must avoid dereferencing a rtcache that is directly stored
in a percpu storage during packet processing. Address this situation by having
just a pointer to a rtcache in a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@

-

l2tp(4): avoid having struct ifqueue directly in a percpu storage.
percpu(9) has a certain memory storage for each CPU and provides it by the piece
to users. If the storages went short, percpu(9) enlarges them by allocating new
larger memory areas, replacing old ones with them and destroying the old ones.

A percpu storage referenced by a pointer gotten via percpu_getref can be
destroyed by the mechanism after a running thread sleeps even if percpu_putref
has not been called.

Tx processing of l2tp(4) uses normally involves sleepable operations so we
must avoid dereferencing a percpu data (struct ifqueue) after executing Tx
processing. Address this situation by having just a pointer to the data in
a percpu storage instead.

Reviewed by ozaki-r@ and yamaguchi@
 1.6.2.1 29-Feb-2020  ad Sync with head.
 1.1 02-Feb-2017  ozaki-r branches: 1.1.2; 1.1.4; 1.1.8; 1.1.18;
Defer some pr_input to workqueue

pr_input is currently called in softint. Some pr_input such as ICMP, ICMPv6
and CARP can add/delete/update IP addresses and routing table entries. For
example, icmp6_redirect_input updates an a routing table entry and
nd6_ra_input may delete an IP address.

Basically such operations shouldn't be done in softint. That aside, we have
a reason to avoid the situation; psz/psref waits cannot be used in softint,
however they are required to work in such pr_input in the MP-safe world.

The change implements the workqueue pr_input framework called wqinput which
provides a means to defer pr_input of a protocol to workqueue easily.
Currently icmp_input, icmp6_input, carp_proto_input and carp6_proto_input
are deferred to workqueue by the framework.

Proposed and discussed on tech-kern and tech-net
 1.1.18.2 03-Dec-2017  jdolecek update from HEAD
 1.1.18.1 02-Feb-2017  jdolecek file wqinput.h was added on branch tls-maxphys on 2017-12-03 11:39:04 +0000
 1.1.8.2 21-Apr-2017  bouyer Sync with HEAD
 1.1.8.1 02-Feb-2017  bouyer file wqinput.h was added on branch bouyer-socketcan on 2017-04-21 16:54:06 +0000
 1.1.4.2 20-Mar-2017  pgoyette Sync with HEAD
 1.1.4.1 02-Feb-2017  pgoyette file wqinput.h was added on branch pgoyette-localcount on 2017-03-20 06:57:51 +0000
 1.1.2.2 05-Feb-2017  skrll Sync with HEAD
 1.1.2.1 02-Feb-2017  skrll file wqinput.h was added on branch nick-nhusb on 2017-02-05 13:40:59 +0000

RSS XML Feed